GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE9307

Query DataSets for GSE9307

Status

Public on Nov 06, 2007

Title

DNA pooling using Affymetrix HindIII and Ilummina HumanHap arrays

Organism

Homo sapiens

Experiment type

Other

Summary

The experiment was based on 3 arrays (3 Illumina HumanHap300 and 3 Affymetrix Genechip HindIII arrays) of each type being hybridized to a single pool which contained equal amounts of DNA from each of 384 individuals. The goal is to estimate a pooling allele frequency, the average frequency of allele 1, say, in the set of 384 individuals. After processing, the raw data are summarized to give pooling allele frequency estimates for each array.
Abstract from paper comparing two arrays (one affy, one illumina) is as follows; Genome wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and show that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300 based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII based pooling only extracts ~30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100 fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing ~20 fold reduction in total cost compared with IG based alternatives. The large cost savings with Illumina HumanHap300 based pooling imply that future studies need only be limited by the availability of samples and not cost.
Keywords: DNA pooling experiment

Overall design

A pool was typed using 3 Affymetrix HindIII arrays. Estimates of pooling allele frequency were obtained.
Pool of 384 individuals - data in matrix is for 3 replicate arrays on this pool. Information on each array individually is in the uploaded files with raw match and mismatch scores
Further details on the raw probe score data also included with this submission are below
For Affymetrix arrays, there is information on Perfect Match (PM) and Mis-Match (MM) intensities for each allele. Essentially the estimate of pooling allele frequency (PAF) comes from A/(A+B) where A is PM-MM, similarly for B. From the header line of the affy file "0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - PA(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - PB(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - MA(Sense)","0586-EN Control Pool Repl 1 50KHind 09-03-05_Quartet1 - MB(Sense)" so the order of the information is PM-(allele A) PM-(allele B) MM-(allele A) MM-(allele B) This same information is also available on the anti-sense strand of the array. Further, the sense and antisense strands have data at up to 7 locations on the array. That is, the 4 columns of PM/MM data are repeated (2*7) 14 times - so there are 56 columns of data plus the column with the snp name. In practice only 10 of the possible 14 sets of 4 columns give valid data so there are 10 sets of 4 columns that yield estimates of A/(A+B) (i.e. pooling allele frequency estimates or PAFs). These 10 PAFs are accumulated over mutiple arrays and finally used to get an overall estimate (using a statistical model described in Macgregor et al, Nucleic Acids Research, 34(7):e55, 2006) of the frequency of a particular allele in the set of pooled individuals. In our case, 3 affymetrix arrays for each pool are used so there are up to 30 PAF values used in the final calculation. In practice some of the individual PAFs don't get included as they fail on the array. Since the array data are fairly noisy, we use the ~30 PAF values to bring down the array error - essentially it is the fact that there is up to 30 fold redundancy of the array that enables the pooling to work satisfactorily.
The raw probe score data for the 3 arrays is in the text files conrep1.csv, conrep2.csv and conrep3.csv

The summary file, forgeoconpoolfreq.txt, contains the rs and Illumina names (Code), along with the physical position, chromosome and estimate of allele frequency for the pool based on the raw data from all 3 arrays used. The estimate of pooling allele frequency (PAF) comes from PAF=R/(R+G) where R and G are the red and green intensities respectively. The PAFs were normalized to ensure that the mean allele frequency was 0.5 over each strand of the array (i.e. over each of the 10 sets of ~30k SNPs on the array).

Contributor(s)

Macgregor S, Zhao Z, Henders A, Martin NG, Montgomery GW, Visscher PM

Citation(s)

18276640

Submission date

Oct 12, 2007

Last update date

Dec 22, 2017

Contact name

Stuart Macgregor

E-mail(s)

stuart.macgregor@qimr.edu.au

Organization name

Queensland Institute of Medical Research

Department

Genetic Epidemiology

Street address

300 Herston Road

City

Brisbane

State/province

Queensland

ZIP/Postal code

4029

Country

Australia

Platforms (2)

GPL2004	[Mapping50K_Hind240] Affymetrix Human Mapping 50K Hind240 SNP Array
GPL6083	Illumina HumanHap300 array

Samples (2)

GSM237112	Pool of 384 individuals (Affymetrix)
GSM241088	Pool of 384 individuals (Illumina)

Relations

BioProject

PRJNA102959

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE9307_RAW.tar	168.7 Mb	(http)(custom)	TAR (of CEL, TXT)
GSE9307_forgeoconpoolfreq.txt	14.7 Mb	(ftp)(http)	TXT
Processed data included within Sample table