BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions

Meng Huang; Xiaolei Liu; Yao Zhou; Ryan M Summers; Zhiwu Zhang

doi:10.1093/gigascience/giy154

BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions

Gigascience. 2019 Feb 1;8(2):giy154. doi: 10.1093/gigascience/giy154.

Authors

Meng Huang¹, Xiaolei Liu², Yao Zhou¹, Ryan M Summers³, Zhiwu Zhang¹

Affiliations

¹ Department of Crop and Soil Sciences, Washington State University, 1170 NE Stadium Way, Pullman, Washington, 99164-6420, USA.
² Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, 1 Shizishan Street, Wuhan, Hubei, 430070, China.
³ School of Electrical Engineering and Computer Science, Washington State University, 355 NE Spokane Street, Pullman, Washington, 99164-2752, USA.

Abstract

Big datasets, accumulated from biomedical and agronomic studies, provide the potential to identify genes that control complex human diseases and agriculturally important traits through genome-wide association studies (GWAS). However, big datasets also lead to extreme computational challenges, especially when sophisticated statistical models are employed to simultaneously reduce false positives and false negatives. The newly developed fixed and random model circulating probability unification (FarmCPU) method uses a bin method under the assumption that quantitative trait nucleotides (QTNs) are evenly distributed throughout the genome. The estimated QTNs are used to separate a mixed linear model into a computationally efficient fixed effect model (FEM) and a computationally expensive random effect model (REM), which are then used iteratively. To completely eliminate the computationally expensive REM, we replaced REM with FEM by using Bayesian information criteria. To eliminate the requirement that QTNs be evenly distributed throughout the genome, we replaced the bin method with linkage disequilibrium information. The new method is called Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK). Both real and simulated data analyses demonstrated that BLINK improves statistical power compared to FarmCPU, in addition to remarkably reducing computing time. Now, a dataset with one million individuals and one-half million markers can be analyzed within three hours, instead of one week using FarmCPU.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
Bayes Theorem
Computational Biology / methods
Female
Genome-Wide Association Study / methods*
Humans
Linkage Disequilibrium
Male
Models, Genetic
Models, Statistical*
Plants / genetics
Polymorphism, Single Nucleotide*
Software*