GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1847530

Query DataSets for GSM1847530

Status

Public on Jul 19, 2016

Title

DarrowHuntley-2015-HIC011

Sample type

SRA

Source name

Retinal Pigmented Epithelial Cells

Organism

Homo sapiens

Characteristics

cell line: RPE1-deltaDXZ4i
cell type: Retinal Pigmented Epithelial Cells
protocol: in situ Hi-C

Growth protocol

Cell lines were cultured according to manufacturer's instructions

Extracted molecule

genomic DNA

Extraction protocol

Cells were crosslinked and then lysed with nuclei permeabilized but still intact. DNA was then restricted with MboI and the overhangs filled in incorporating a biotinylated base. Free ends were then ligated together in situ. Crosslinks were reversed, the DNA was sheared and then biotinylated ligation junctions were recovered with streptavidin beads.
Standard Illumina library construction protocol was performed, and libraries were sequenced on the HiSeq X Ten/NextSeq/HiSeq2500 following the manufacturer's protocols.

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

HiSeq X Ten

Description

Processed data files were not available at the time of accessioning.

Data processing

The paired end reads were aligned separately using BWA against the b37 (human), mm10 (mouse), or rheMac2 (rhesus macaque).
PCR duplicates, low mapping quality and unligated reads were removed using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014l)
Contact matrices were constructed at various resolutions and normalized using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014)
genome build: b37 (human), mm10 (mouse), rheMac2 (rhesus macaque)
processed data files format and content: Contact matrices: a text file with the raw observed contact matrix in sparse matrix notation at a given resolution. Only the upper triangle of the matrix is provided (i.e. i<=j), the matrix is symmetric, so M_i,j = M_j,i. At this stage of processing, read pairs where one or both ends do not align to the reference genome have already been removed, as well as chimeric ambiguous reads (see Section II.a.2 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a definition of chimeric ambiguous reads). In addition, duplicate reads (reads where both ends align to within +/- 4bp of each other) have been removed as well (see Section II.a.3 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a full description of duplicate removal). Full details of the Hi-C processing pipeline used in this study are provided in Section II.a. of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014.
processed data files format and content: Normalization files: normalization vectors that can be used to transform the raw contact matrices M into normalized matrices M*. Each file is ordered such that the first line of the normalization vector file is the norm factor for the first row/column of the corresponding raw contact matrix, the second line is the factor for the second row/column of the contact matrix, and so on. To normalize, an entry M_i,j in a *RAWobserved file, divide the entry by the corresponding norm factors for i and j. (See section II.b of the Extended Experimental Procedures of Rao, Huntley, et al., Cell, 2014 for more information about the different types of normalizations.)
processed data files format and content: HiCCUPS_looplist.txt files contain loop calls generated via HiCCUPS; first three fields represent the locus participating in the loop closer to the p-end of the chromosome; fields 4-6 represent the locus participating in the loop closer to the q-end of the chromosome; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 represents the observed number of counts at the loop; fields 9-12 represent the expected number of counts at the loop using four different expected models; fields 13-16 are the q-values over each of the expected values; field 17 is the number of enriched pixels that was clustered into a particular loop; field 18-19 are the centroid of the loop; field 20 is the radius of the loop
processed data files format and content: Arrowhead_domainlist.txt files contain domain calls generated via Arrowhead; first 6 fields represent the boundaries of the domain; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 is the corner score for the domain (see Rao, Huntley, et al); fields 9-12 are the component scores used in the Arrowhead algorithm (see Rao, Huntley, et al)
processed data files format and content: merged_nodups.txt files contain filtered, "normal" contacts. Each line represents a single Hi-C read pair that has passed the alignment and duplicate removal stages. The format of each line of the file is: read_name, strand1, chromosome1, position1, fragment-index1, strand2, chromosome2 ,position2, fragment-index2, mapq1, mapq2
processed data files format and content: collisions.txt.gz files contain the contacts that have 3 or more loci.

Submission date

Aug 11, 2015

Last update date

May 15, 2019

Contact name

Miriam Huntley

E-mail(s)

mhuntley@fas.harvard.edu

Organization name

Harvard University

Street address

29 Oxford Street

City

Cambridge

State/province

ZIP/Postal code

02138

Country

USA

Platform ID

GPL20795

Series (1)

GSE71831

Deletion of DXZ4 on the human inactive X chromosome eliminates superdomains and impairs gene silencing

Relations

BioSample

SAMN03979968

SRA

SRX1165020

Supplementary file	Size	Download	File type/resource
GSM1847530_DarrowHuntley-2015-HIC011_merged_nodups.txt.gz	36.0 Gb	(ftp)(http)	TXT
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record