Prediction of G-protein coupled receptors and their subfamilies by incorporating various sequence features into Chou's general PseAAC

Comput Methods Programs Biomed. 2016 Oct:134:197-213. doi: 10.1016/j.cmpb.2016.07.004. Epub 2016 Jul 9.

Abstract

Background and objective: The G-protein coupled receptors are the largest superfamilies of membrane proteins and important targets for the drug design. G-protein coupled receptors are responsible for many physiochemical processes such as smell, taste, vision, neurotransmission, metabolism, cellular growth and immune response. So it is necessary to design a robust and efficient approach for the prediction of G-protein coupled receptors and their subfamilies.

Methods: In this paper, the protein samples are represented by amino acid composition, dipeptide composition, correlation features, composition, transition, distribution, sequence order descriptors and pseudo amino acid composition with total 1497 number of sequence derived features. To address the issue of efficient classification of G-protein coupled receptors and their subfamilies, we propose to use a weighted k-nearest neighbor classifier with UNION of best 50 features, selected by Fisher score based feature selection, ReliefF, fast correlation based filter, minimum redundancy maximum relevancy, and support vector machine based recursive elimination feature selection methods to exploit the advantages of these feature selection methods.

Results: The proposed method achieved an overall accuracy of 99.9%, 98.3%, 95.4%, MCC values of 1.00, 0.98, 0.95, ROC area values of 1.00, 0.998, 0.996 and precision of 99.9%, 98.3% and 95.5% using 10-fold cross-validation to predict the G-protein coupled receptors and non-G-protein coupled receptors, subfamilies of G-protein coupled receptors, and subfamilies of class A G-protein coupled receptors, respectively.

Conclusions: The high accuracies, MCC, ROC area values, and precision values indicate that the proposed method is better for the prediction of G-protein coupled receptors families and their subfamilies.

Keywords: G-protein coupled receptors; Matthew's correlation coefficient; Minimum redundancy maximum relevance; Sequence derived properties; Weighted k-nearest neighbor.

MeSH terms

  • Amino Acid Sequence
  • Humans
  • Receptors, G-Protein-Coupled / chemistry*
  • Support Vector Machine

Substances

  • Receptors, G-Protein-Coupled