vSampler: fast and annotation-based matched variant sampling tool

Bioinformatics. 2021 Jul 27;37(13):1915-1917. doi: 10.1093/bioinformatics/btaa883.

Abstract

Summary: Sampling of control variants having matched properties with input variants is widely used in enrichment analysis of genome-wide association studies/quantitative trait loci and negative data construction for pathogenic/regulatory variant prediction methods. Spurious enrichment results because of confounding factors, such as minor allele frequency and linkage disequilibrium pattern, can be avoided by calibration of statistical significance based on matched controls. Here, we presented vSampler which can generate sets of randomly drawn variants with comprehensive choices of matching properties, such as tissue/cell type-specific epigenomic features. Importantly, the development of a novel data structure and sampling algorithms for vSampler makes it significantly fast than existing tools.

Availability and implementation: vSampler web server and local program are available at http://mulinlab.org/vsampler.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome-Wide Association Study*
  • Humans
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci / genetics
  • Software*