NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE9307 Query DataSets for GSE9307
Status Public on Nov 06, 2007
Title DNA pooling using Affymetrix HindIII and Ilummina HumanHap arrays
Organism Homo sapiens
Experiment type Other
Summary The experiment was based on 3 arrays (3 Illumina HumanHap300 and 3 Affymetrix Genechip HindIII arrays) of each type being hybridized to a single pool which contained equal amounts of DNA from each of 384 individuals. The goal is to estimate a pooling allele frequency, the average frequency of allele 1, say, in the set of 384 individuals. After processing, the raw data are summarized to give pooling allele frequency estimates for each array.
Abstract from paper comparing two arrays (one affy, one illumina) is as follows; Genome wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and show that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300 based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII based pooling only extracts ~30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100 fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing ~20 fold reduction in total cost compared with IG based alternatives. The large cost savings with Illumina HumanHap300 based pooling imply that future studies need only be limited by the availability of samples and not cost.
Keywords: DNA pooling experiment
 
Overall design A pool was typed using 3 Affymetrix HindIII arrays. Estimates of pooling allele frequency were obtained.
Pool of 384 individuals - data in matrix is for 3 replicate arrays on this pool. Information on each array individually is in the uploaded files with raw match and mismatch scores
Further details on the raw probe score data also included with this submission are below
For Affymetrix arrays, there is information on Perfect Match (PM) and Mis-Match (MM) intensities for each allele. Essentially the estimate of pooling allele frequency (PAF) comes from A/(A+B) where A is PM-MM, similarly for B. From the header line of the affy file "0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - PA(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - PB(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - MA(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - MB(Sense)" so the order of the information is PM-(allele A) PM-(allele B) MM-(allele A) MM-(allele B) This same information is also available on the anti-sense strand of the array. Further, the sense and antisense strands have data at up to 7 locations on the array. That is, the 4 columns of PM/MM data are repeated (2*7) 14 times - so there are 56 columns of data plus the column with the snp name. In practice only 10 of the possible 14 sets of 4 columns give valid data so there are 10 sets of 4 columns that yield estimates of A/(A+B) (i.e. pooling allele frequency estimates or PAFs). These 10 PAFs are accumulated over mutiple arrays and finally used to get an overall estimate (using a statistical model described in Macgregor et al, Nucleic Acids Research, 34(7):e55, 2006) of the frequency of a particular allele in the set of pooled individuals. In our case, 3 affymetrix arrays for each pool are used so there are up to 30 PAF values used in the final calculation. In practice some of the individual PAFs don't get included as they fail on the array. Since the array data are fairly noisy, we use the ~30 PAF values to bring down the array error - essentially it is the fact that there is up to 30 fold redundancy of the array that enables the pooling to work satisfactorily.
The raw probe score data for the 3 arrays is in the text files conrep1.csv, conrep2.csv and conrep3.csv

The summary file, forgeoconpoolfreq.txt, contains the rs and Illumina names (Code), along with the physical position, chromosome and estimate of allele frequency for the pool based on the raw data from all 3 arrays used. The estimate of pooling allele frequency (PAF) comes from PAF=R/(R+G) where R and G are the red and green intensities respectively. The PAFs were normalized to ensure that the mean allele frequency was 0.5 over each strand of the array (i.e. over each of the 10 sets of ~30k SNPs on the array).
 
Contributor(s) Macgregor S, Zhao Z, Henders A, Martin NG, Montgomery GW, Visscher PM
Citation(s) 18276640
Submission date Oct 12, 2007
Last update date Dec 22, 2017
Contact name Stuart Macgregor
E-mail(s) stuart.macgregor@qimr.edu.au
Organization name Queensland Institute of Medical Research
Department Genetic Epidemiology
Street address 300 Herston Road
City Brisbane
State/province Queensland
ZIP/Postal code 4029
Country Australia
 
Platforms (2)
GPL2004 [Mapping50K_Hind240] Affymetrix Human Mapping 50K Hind240 SNP Array
GPL6083 Illumina HumanHap300 array
Samples (2)
GSM237112 Pool of 384 individuals (Affymetrix)
GSM241088 Pool of 384 individuals (Illumina)
Relations
BioProject PRJNA102959

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE9307_RAW.tar 168.7 Mb (http)(custom) TAR (of CEL, TXT)
GSE9307_forgeoconpoolfreq.txt 14.7 Mb (ftp)(http) TXT
Processed data included within Sample table

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap