An integration of deep learning with feature embedding for protein-protein interaction prediction

PeerJ. 2019 Jun 17:7:e7126. doi: 10.7717/peerj.7126. eCollection 2019.

Abstract

Protein-protein interactions are closely relevant to protein function and drug discovery. Hence, accurately identifying protein-protein interactions will help us to understand the underlying molecular mechanisms and significantly facilitate the drug discovery. However, the majority of existing computational methods for protein-protein interactions prediction are focused on the feature extraction and combination of features and there have been limited gains from the state-of-the-art models. In this work, a new residue representation method named Res2vec is designed for protein sequence representation. Residue representations obtained by Res2vec describe more precisely residue-residue interactions from raw sequence and supply more effective inputs for the downstream deep learning model. Combining effective feature embedding with powerful deep learning techniques, our method provides a general computational pipeline to infer protein-protein interactions, even when protein structure knowledge is entirely unknown. The proposed method DeepFE-PPI is evaluated on the S. Cerevisiae and human datasets. The experimental results show that DeepFE-PPI achieves 94.78% (accuracy), 92.99% (recall), 96.45% (precision), 89.62% (Matthew's correlation coefficient, MCC) and 98.71% (accuracy), 98.54% (recall), 98.77% (precision), 97.43% (MCC), respectively. In addition, we also evaluate the performance of DeepFE-PPI on five independent species datasets and all the results are superior to the existing methods. The comparisons show that DeepFE-PPI is capable of predicting protein-protein interactions by a novel residue representation method and a deep learning classification framework in an acceptable level of accuracy. The codes along with instructions to reproduce this work are available from https://github.com/xal2019/DeepFE-PPI.

Keywords: Deep learning; Feature embedding; Machine learning; protein–protein interaction.

Grants and funding

This work was supported by the Anhui Provincial Natural Science Foundation under Grand 1708085QF143. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.