PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences

Int J Mol Sci. 2017 May 11;18(5):1029. doi: 10.3390/ijms18051029.

Abstract

Protein-protein interactions (PPIs) are essential for most living organisms' process. Thus, detecting PPIs is extremely important to understand the molecular mechanisms of biological systems. Although many PPIs data have been generated by high-throughput technologies for a variety of organisms, the whole interatom is still far from complete. In addition, the high-throughput technologies for detecting PPIs has some unavoidable defects, including time consumption, high cost, and high error rate. In recent years, with the development of machine learning, computational methods have been broadly used to predict PPIs, and can achieve good prediction rate. In this paper, we present here PCVMZM, a computational method based on a Probabilistic Classification Vector Machines (PCVM) model and Zernike moments (ZM) descriptor for predicting the PPIs from protein amino acids sequences. Specifically, a Zernike moments (ZM) descriptor is used to extract protein evolutionary information from Position-Specific Scoring Matrix (PSSM) generated by Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then, PCVM classifier is used to infer the interactions among protein. When performed on PPIs datasets of Yeast and H. Pylori, the proposed method can achieve the average prediction accuracy of 94.48% and 91.25%, respectively. In order to further evaluate the performance of the proposed method, the state-of-the-art support vector machines (SVM) classifier is used and compares with the PCVM model. Experimental results on the Yeast dataset show that the performance of PCVM classifier is better than that of SVM classifier. The experimental results indicate that our proposed method is robust, powerful and feasible, which can be used as a helpful tool for proteomics research.

Keywords: position-specific scoring matrix; probabilistic classification vector machines; proteins.

MeSH terms

  • Bacterial Proteins / classification
  • Bacterial Proteins / metabolism
  • Computational Biology / methods*
  • Databases, Protein
  • Fungal Proteins / classification
  • Fungal Proteins / metabolism
  • Helicobacter pylori / metabolism
  • Models, Statistical
  • Position-Specific Scoring Matrices
  • Protein Interaction Mapping / methods*
  • Sequence Analysis, Protein / methods*
  • Software*
  • Support Vector Machine
  • Yeasts / metabolism

Substances

  • Bacterial Proteins
  • Fungal Proteins