Bayes empirical bayes inference of amino acid sites under positive selection

Mol Biol Evol. 2005 Apr;22(4):1107-18. doi: 10.1093/molbev/msi097. Epub 2005 Feb 2.

Abstract

Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (d(N)/d(S), denoted omega) is used as a measure of selective pressure at the protein level, with omega > 1 indicating positive selection. Statistical distributions are used to model the variation in omega among sites, allowing a subset of sites to have omega > 1 while the rest of the sequence may be under purifying selection with omega < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with omega > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and omega ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alleles
  • Amino Acids / genetics*
  • Bayes Theorem*
  • Computer Simulation
  • Genes, Viral
  • Humans
  • Major Histocompatibility Complex / genetics
  • Selection, Genetic*

Substances

  • Amino Acids