NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1181868 Query DataSets for GSM1181868
Status Public on Nov 03, 2013
Title Hi-C, GM12878 Lymphoblastoid cells, replicate two
Sample type SRA
 
Source name GM12878 Lymphoblastoid cells
Organism Homo sapiens
Characteristics cell line: GM12878 Lymphoblastoid cell line
Biomaterial provider Coriell; http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM12878
Treatment protocol None
Growth protocol GM12878 cells (Coriell) were cultured in suspension in 85% RPMI media supplemental with 15% fetal bovine serum and 1X penicillin/streptomycin.
see samples section
Extracted molecule genomic DNA
Extraction protocol Hi-C experiments were conducted using HindIII according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).).
Sequencing libraries were constructed according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).).
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina HiSeq 2500
 
Description GM12878_lcp.vcf
GM12878_depristoeal.vcf
GM12878_seed.haps
Data processing fastq: Illumina's HiSeq Control Software
For Hi-C read alignment, we aligned Hi-C reads to the mm9 (mouse) or the hg18 (human) genome. In each case, we masked any bases in the genome that were genotyped as SNPs in either Mus musculus castaneus or S129/SvJae (for mouse) or GM12878 (for humans). These bases were masked to ā€œNā€ in order to reduce reference bias mapping artifacts. Hi-C reads were aligned iteratively as single end reads using Novoalign. Specifically, for iterative alignment, we first aligned the entire sequencing read to either the mouse or human genome. Unmapped reads are then trimmed by 5 base pairs and realigned. This process is repeated until the read successfully aligns to the genome or until the trimmed read is less than 25 base pairs long. After iterative mapping was finished, read pairs were re-constructed from single reads using an in house pipeline. Unmapped reads were filtered out and PCR duplicate reads were removed. Final alignment files were then processed using the GATK pipeline, specifically using Indel Realignment and Variant Recalibration
Haplotypes were generated from the final aligned bam file after merging the two biological replicats using the HapCUT algorithm. The details of HapCUT are described previously (Bansal and Bafna, Bioinformatics 24, i153-159, 2008).
Genome_build: mm9
Genome_build: hg18
Supplementary_files_format_and_content: The castx129_variants.vcf and GM12878_depristoeal.vcf are VCF format files of the variants used for input into the haplotyping algorithm. Both of these files are derived from publicly available datasets. WIth regards to the "publicly available datasets", the castx129_variants.vcf file is derived from data downloaded from the ENA (ERP000042) and the SRA (SRX037820). The GM12878_depristoeal.vcf is downloaded from the 1000 genomes project.
Supplementary_files_format_and_content: The F123.haps and GM12878_seed.haps are modified bed format files. In this files, the first column is the chromosome, and the second column is the location of the variant. The third and fourth column are the phased variants in the "A" and "B" haplotypes. The choice of "A" and "B" is arbitrary, and it should be noted that the "A" haplotype from one chromosome is not necessarily derived from the same parent as the "A" haplotype from a different chromosome.
Supplementary_files_format_and_content: The GM12878_lcp.vcf file is a VCF format file from after local conditional phasing of variants in the seed haplotype
 
Submission date Jul 08, 2013
Last update date Feb 22, 2021
Contact name Jesse R Dixon
E-mail(s) jedixon@salk.edu
Organization name Salk Institute for Biological Studies
Lab PBL-D
Street address 10010 N. Torrey Pines Rd.
City La Jolla
State/province CA
ZIP/Postal code 92037
Country USA
 
Platform ID GPL16791
Series (1)
GSE48592 Whole-genome Haplotype Reconstruction using Proximity-ligation and Shotgun Sequencing
Relations
Reanalyzed by GSE85977
Reanalyzed by GSE87112
Reanalyzed by GSE115407
Reanalyzed by GSE128678
Reanalyzed by GSE167200
BioSample SAMN02228121
SRA SRX318777

Supplementary data files not provided
SRA Run SelectorHelp
Processed data are available on Series record
Raw data are available in SRA

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap