Most highly expressed protein-coding genes have a single dominant isoform

J Proteome Res. 2015 Apr 3;14(4):1880-7. doi: 10.1021/pr501286b. Epub 2015 Mar 11.

Abstract

Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.

Keywords: Alternative splicing; Dominant isoforms; Large-scale proteomics; Protein function; Protein structure; RNAseq.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Databases, Protein
  • Humans
  • Open Reading Frames / genetics*
  • Peptides / genetics*
  • Protein Isoforms / genetics*
  • Proteomics / methods*

Substances

  • Peptides
  • Protein Isoforms