Method Detail
Submitter Method Handle: 1000GENOMES
Submitted method description:
1000 Genomes Project Pilot 3, March 2010 SNP call release:
Samples: 697 total individuals from 7 HapMap populations.
90 CEU, 66 TSI, 109 CHB, 107 CHD, 105 JPT, 108 LWK, 112 YRI.
Exon targets: Initial target selection targeted 1,020 genes
(~10,000 exons or 2.3 Mb), using CCDS annotations. The current
data release includes SNP calls only within regions which were
successfully captured at all four data producing centers. These
amount to 8496 regions with total length ~1.43 Mb. The genes and
exact region boundaries relative to the NCBI version 36 genome
assembly are reported in files:
and /P3_consensus_exonic_targets.bed (UCSC .bed format file).
The enrichment protocols will be described in a forthcoming paper.
DNA samples were sequenced with two sequencing technologies (454
at BCM, Illumina at BI, WTSI, WUGSC). Overall average sequence
coverage ranges from 30-60 x per individual in each population.
Data processing pipelines: The present release is based on SNP
calls made at two analysis centers: Boston College (Amit Indap,
Wen Fung Leong, and Gabor Marth), and at the Broad Institute
(Kiran Garimella and Chris Hartl). For pipeline details see the
individual call README information. Summary information is below.
Boston College Pipeline:
Read mapper: MOSAIK
Duplicate removal: BCMMarkduplicates (454 data); Picard MarkDuplicates (Illumina data)
Base quality re-calibration (GATK: Illumina data; None: 454 data)
SNP caller: GigaBayes(BamBayes)
Version date: 2010 February 02
Broad Institute Pipeline:
Read mapper: MAQ (Illumina data); SSAHA2 (454 data)
Duplicate removal: Picard MarkDuplicates (454 and Illumina data)
Base quality calibration (GATK: 454 and Illumina data)
SNP caller: UnifiedGenotyper
Version date: 2010 January 26
Component SNP call characteristics: The Boston College SNP calls
are made using all 697 samples simultaneously. Per-population call
sets are derived by reporting, for a given population, the subset
of called sites that included a variant genotype in at least one of
the individuals in that population (i.e. sites that segregate in
that population). The Broad Institute SNP calls are made separately
within each of the 7 populations. A list of SNPs containing sites
that segregated in any of the 697 individuals is produced as a union
of the population-specific calls. The final release set contains
SNP (variant) sites that are present in both the Boston College and
the Broad Institute call sets. If an individual's genotype differs
between the BC and BI call sets, or is missing in one set, the genotype
is reported as missing. The fraction of missing genotypes in each of
the 7 populations is: CEU 0.73%, TSI 0.64%, CHB 0.50%, CHD 0.82%,
JPT 0.99%, LWK 0.36%, YRI 0.52%.
SNP site statistics:
ALL 697 individuals
Total BC and BI's Intersection SNP/Sites = 12,761
SNPs/Sites in Dbsnp = 3,869 (30.32%)
Transitions:transversions ratio (Ts:Tv) = 3.81
CEU (90 individuals)
Total BC and BI's Intersection SNP/Sites = 3,489
SNPs/Sites in Dbsnp = 2,300 (65.92%)
Transitions:transversions ratio (Ts:Tv) = 3.47
TSI (66 individuals)
Total BC and BI's Intersection SNP/Sites = 3,281
SNPs/Sites in Dbsnp = 2,152 (65.59%)
Transitions:transversions ratio (Ts:Tv) = 3.54
CHB (109 individuals)
Total BC and BI's Intersection SNP/Sites = 3,415
SNPs/Sites in Dbsnp = 1,795 (52.56%)
Transitions:transversions ratio (Ts:Tv) = 3.74
CHD (107 individuals)
Total BC and BI's Intersection SNP/Sites = 3,431
SNPs/Sites in Dbsnp = 1,724 (50.25%)
Transitions:transversions ratio (Ts:Tv) = 3.64
JPT (105 individuals)
Total BC and BI's Intersection SNP/Sites = 2,900
SNPs/Sites in Dbsnp = 1,679 (57.90%)
Transitions:transversions ratio (Ts:Tv) = 3.67
LWK (108 individuals)
Total BC and BI's Intersection SNP/Sites = 5,459
SNPs/Sites in Dbsnp = 2,736 (50.12%)
Transitions:transversions ratio (Ts:Tv) = 3.67
YRI (112 individuals)
Total BC and BI's Intersection SNP/Sites = 5,175
SNPs/Sites in Dbsnp = 2,785 (53.82%)
Transitions:transversions ratio (Ts:Tv) = 3.56

This method was used in the following submission:

Submitter Handle Batch Type Submitter batch id Release build id
1000GENOMES Frequency pilot_3_CEU_mar_2010 132
1000GENOMES Frequency pilot_3_CHB_mar_2010 132
1000GENOMES Frequency pilot_3_CHD_mar_2010 132
1000GENOMES Frequency pilot_3_JPT_mar_2010 132
1000GENOMES Frequency pilot_3_TSI_mar_2010 132
1000GENOMES Frequency pilot_3_LWK_mar_2010 132
1000GENOMES Frequency pilot_3_YRI_mar_2010 132
1000GENOMES Assay pilot_3_CEU_mar_2010 132
1000GENOMES Assay pilot_3_CHB_mar_2010 132
1000GENOMES Assay pilot_3_CHD_mar_2010 132
1000GENOMES Assay pilot_3_JPT_mar_2010 132
1000GENOMES Assay pilot_3_LWK_mar_2010 132
1000GENOMES Assay pilot_3_TSI_mar_2010 132
1000GENOMES Assay pilot_3_YRI_mar_2010 132