NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1847530 Query DataSets for GSM1847530
Status Public on Jul 19, 2016
Title DarrowHuntley-2015-HIC011
Sample type SRA
 
Source name Retinal Pigmented Epithelial Cells
Organism Homo sapiens
Characteristics cell line: RPE1-deltaDXZ4i
cell type: Retinal Pigmented Epithelial Cells
protocol: in situ Hi-C
Growth protocol Cell lines were cultured according to manufacturer's instructions
Extracted molecule genomic DNA
Extraction protocol Cells were crosslinked and then lysed with nuclei permeabilized but still intact. DNA was then restricted with MboI and the overhangs filled in incorporating a biotinylated base. Free ends were then ligated together in situ. Crosslinks were reversed, the DNA was sheared and then biotinylated ligation junctions were recovered with streptavidin beads. 
Standard Illumina library construction protocol was performed, and libraries were sequenced on the HiSeq X Ten/NextSeq/HiSeq2500 following the manufacturer's protocols.
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model HiSeq X Ten
 
Description Processed data files were not available at the time of accessioning.
Data processing The paired end reads were aligned separately using BWA against the b37 (human), mm10 (mouse), or rheMac2 (rhesus macaque).
PCR duplicates, low mapping quality and unligated reads were removed using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014l)
Contact matrices were constructed at various resolutions and normalized using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014)
genome build: b37 (human), mm10 (mouse), rheMac2 (rhesus macaque)
processed data files format and content: Contact matrices: a text file with the raw observed contact matrix in sparse matrix notation at a given resolution. Only the upper triangle of the matrix is provided (i.e. i<=j), the matrix is symmetric, so M_i,j = M_j,i. At this stage of processing, read pairs where one or both ends do not align to the reference genome have already been removed, as well as chimeric ambiguous reads (see Section II.a.2 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a definition of chimeric ambiguous reads). In addition, duplicate reads (reads where both ends align to within +/- 4bp of each other) have been removed as well (see Section II.a.3 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a full description of duplicate removal). Full details of the Hi-C processing pipeline used in this study are provided in Section II.a. of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014.
processed data files format and content: Normalization files: normalization vectors that can be used to transform the raw contact matrices M into normalized matrices M*. Each file is ordered such that the first line of the normalization vector file is the norm factor for the first row/column of the corresponding raw contact matrix, the second line is the factor for the second row/column of the contact matrix, and so on. To normalize, an entry M_i,j in a *RAWobserved file, divide the entry by the corresponding norm factors for i and j. (See section II.b of the Extended Experimental Procedures of Rao, Huntley, et al., Cell, 2014 for more information about the different types of normalizations.)
processed data files format and content: HiCCUPS_looplist.txt files contain loop calls generated via HiCCUPS; first three fields represent the locus participating in the loop closer to the p-end of the chromosome; fields 4-6 represent the locus participating in the loop closer to the q-end of the chromosome; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 represents the observed number of counts at the loop; fields 9-12 represent the expected number of counts at the loop using four different expected models; fields 13-16 are the q-values over each of the expected values; field 17 is the number of enriched pixels that was clustered into a particular loop; field 18-19 are the centroid of the loop; field 20 is the radius of the loop
processed data files format and content: Arrowhead_domainlist.txt files contain domain calls generated via Arrowhead; first 6 fields represent the boundaries of the domain; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 is the corner score for the domain (see Rao, Huntley, et al); fields 9-12 are the component scores used in the Arrowhead algorithm (see Rao, Huntley, et al)
processed data files format and content: merged_nodups.txt files contain filtered, "normal" contacts. Each line represents a single Hi-C read pair that has passed the alignment and duplicate removal stages. The format of each line of the file is: read_name, strand1, chromosome1, position1, fragment-index1, strand2, chromosome2 ,position2, fragment-index2, mapq1, mapq2
processed data files format and content: collisions.txt.gz files contain the contacts that have 3 or more loci.
 
Submission date Aug 11, 2015
Last update date May 15, 2019
Contact name Miriam Huntley
E-mail(s) mhuntley@fas.harvard.edu
Organization name Harvard University
Street address 29 Oxford Street
City Cambridge
State/province MA
ZIP/Postal code 02138
Country USA
 
Platform ID GPL20795
Series (1)
GSE71831 Deletion of DXZ4 on the human inactive X chromosome eliminates superdomains and impairs gene silencing
Relations
BioSample SAMN03979968
SRA SRX1165020

Supplementary file Size Download File type/resource
GSM1847530_DarrowHuntley-2015-HIC011_merged_nodups.txt.gz 36.0 Gb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap