Protein and gene model inference based on statistical modeling in k-partite graphs

Proc Natl Acad Sci U S A. 2010 Jul 6;107(27):12101-6. doi: 10.1073/pnas.0907654107. Epub 2010 Jun 18.

Abstract

One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Arabidopsis Proteins / analysis
  • Computational Biology / methods*
  • Databases, Protein
  • Drosophila Proteins / analysis
  • Markov Chains
  • Models, Statistical*
  • Peptides / analysis
  • Proteins / analysis*
  • Proteome / analysis
  • Proteomics / methods*
  • Reproducibility of Results
  • Saccharomyces cerevisiae Proteins / analysis

Substances

  • Arabidopsis Proteins
  • Drosophila Proteins
  • Peptides
  • Proteins
  • Proteome
  • Saccharomyces cerevisiae Proteins