Integrated analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: zero-inflated Poisson regression models to predict abundance of undetected proteins

Bioinformatics. 2006 Jul 1;22(13):1641-7. doi: 10.1093/bioinformatics/btl134. Epub 2006 May 4.

Abstract

Motivation: Integrated analysis of global scale transcriptomic and proteomic data can provide important insights into the metabolic mechanisms underlying complex biological systems. However, because the relationship between protein abundance and mRNA expression level is complicated by many cellular and physical processes, sophisticated statistical models need to be developed to capture their relationship.

Results: In this study, we describe a novel data-driven statistical model to integrate whole-genome microarray and proteomic data collected from Desulfovibrio vulgaris grown under three different conditions. Based on the Poisson distribution pattern of proteomic data and the fact that a large number of proteins were undetected (excess zeros), zero-inflated Poisson (ZIP)-based models were proposed to define the correlation pattern between mRNA and protein abundance. In addition, by assuming that there is a probability mass at zero representing unexpressed genes and expressed proteins that were undetected owing to technical limitations, a Potential ZIP model was established. Two significant improvements introduced by this approach are (1) the predicted protein abundance level values for experimentally detected proteins are corrected by considering their mRNA levels and (2) protein abundance values can be predicted for undetected proteins (in the case of this study, approximately 83% of the proteins in the D.vulgaris genome) for better biological interpretation. We demonstrated the use of these statistical models by comparatively analyzing proteomic and microarray results from D.vulgaris grown on lactate-based versus formate-based media. These models correctly predicted increased expression of Ech hydrogenase and decreased expression of Coo hydrogenase for D.vulgaris grown on formate.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Adenosine Triphosphate / chemistry
  • Calibration
  • Chromatography, Liquid / methods
  • Computational Biology / methods*
  • Desulfovibrio vulgaris / genetics
  • Desulfovibrio vulgaris / metabolism*
  • Mass Spectrometry
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis
  • Poisson Distribution
  • Probability
  • Proteins / chemistry*
  • Proteomics / methods*
  • RNA, Messenger / metabolism*
  • Regression Analysis

Substances

  • Proteins
  • RNA, Messenger
  • Adenosine Triphosphate