|
|
| 1000 Genomes Project Pilot 3, March 2010 SNP call release: |
| Samples: 697 total individuals from 7 HapMap populations. |
| 90 CEU, 66 TSI, 109 CHB, 107 CHD, 105 JPT, 108 LWK, 112 YRI. |
| Exon targets: Initial target selection targeted 1,020 genes |
| (~10,000 exons or 2.3 Mb), using CCDS annotations. The current |
| data release includes SNP calls only within regions which were |
| successfully captured at all four data producing centers. These |
| amount to 8496 regions with total length ~1.43 Mb. The genes and |
| exact region boundaries relative to the NCBI version 36 genome |
| assembly are reported in files: ftp-trace.ncbi.nlm.nih.gov/ |
| 1000genomes/ftp/pilot_data/technical/reference/P3_gene_list.txt |
| and /P3_consensus_exonic_targets.bed (UCSC .bed format file). |
| The enrichment protocols will be described in a forthcoming paper. |
| DNA samples were sequenced with two sequencing technologies (454 |
| at BCM, Illumina at BI, WTSI, WUGSC). Overall average sequence |
| coverage ranges from 30-60 x per individual in each population. |
| Data processing pipelines: The present release is based on SNP |
| calls made at two analysis centers: Boston College (Amit Indap, |
| Wen Fung Leong, and Gabor Marth), and at the Broad Institute |
| (Kiran Garimella and Chris Hartl). For pipeline details see the |
| individual call README information. Summary information is below. |
| Boston College Pipeline: |
| Read mapper: MOSAIK |
| Duplicate removal: BCMMarkduplicates (454 data); Picard MarkDuplicates (Illumina data) |
| Base quality re-calibration (GATK: Illumina data; None: 454 data) |
| SNP caller: GigaBayes(BamBayes) |
| Version date: 2010 February 02 |
| Broad Institute Pipeline: |
| Read mapper: MAQ (Illumina data); SSAHA2 (454 data) |
| Duplicate removal: Picard MarkDuplicates (454 and Illumina data) |
| Base quality calibration (GATK: 454 and Illumina data) |
| SNP caller: UnifiedGenotyper |
| Version date: 2010 January 26 |
| Component SNP call characteristics: The Boston College SNP calls |
| are made using all 697 samples simultaneously. Per-population call |
| sets are derived by reporting, for a given population, the subset |
| of called sites that included a variant genotype in at least one of |
| the individuals in that population (i.e. sites that segregate in |
| that population). The Broad Institute SNP calls are made separately |
| within each of the 7 populations. A list of SNPs containing sites |
| that segregated in any of the 697 individuals is produced as a union |
| of the population-specific calls. The final release set contains |
| SNP (variant) sites that are present in both the Boston College and |
| the Broad Institute call sets. If an individual's genotype differs |
| between the BC and BI call sets, or is missing in one set, the genotype |
| is reported as missing. The fraction of missing genotypes in each of |
| the 7 populations is: CEU 0.73%, TSI 0.64%, CHB 0.50%, CHD 0.82%, |
| JPT 0.99%, LWK 0.36%, YRI 0.52%. |
| SNP site statistics: |
| ALL 697 individuals |
| Total BC and BI's Intersection SNP/Sites = 12,761 |
| SNPs/Sites in Dbsnp = 3,869 (30.32%) |
| Transitions:transversions ratio (Ts:Tv) = 3.81 |
| CEU (90 individuals) |
| Total BC and BI's Intersection SNP/Sites = 3,489 |
| SNPs/Sites in Dbsnp = 2,300 (65.92%) |
| Transitions:transversions ratio (Ts:Tv) = 3.47 |
| TSI (66 individuals) |
| Total BC and BI's Intersection SNP/Sites = 3,281 |
| SNPs/Sites in Dbsnp = 2,152 (65.59%) |
| Transitions:transversions ratio (Ts:Tv) = 3.54 |
| CHB (109 individuals) |
| Total BC and BI's Intersection SNP/Sites = 3,415 |
| SNPs/Sites in Dbsnp = 1,795 (52.56%) |
| Transitions:transversions ratio (Ts:Tv) = 3.74 |
| CHD (107 individuals) |
| Total BC and BI's Intersection SNP/Sites = 3,431 |
| SNPs/Sites in Dbsnp = 1,724 (50.25%) |
| Transitions:transversions ratio (Ts:Tv) = 3.64 |
| JPT (105 individuals) |
| Total BC and BI's Intersection SNP/Sites = 2,900 |
| SNPs/Sites in Dbsnp = 1,679 (57.90%) |
| Transitions:transversions ratio (Ts:Tv) = 3.67 |
| LWK (108 individuals) |
| Total BC and BI's Intersection SNP/Sites = 5,459 |
| SNPs/Sites in Dbsnp = 2,736 (50.12%) |
| Transitions:transversions ratio (Ts:Tv) = 3.67 |
| YRI (112 individuals) |
| Total BC and BI's Intersection SNP/Sites = 5,175 |
| SNPs/Sites in Dbsnp = 2,785 (53.82%) |
| Transitions:transversions ratio (Ts:Tv) = 3.56 |