Predicting Drug-Target Interactions Based on the Ensemble Models of Multiple Feature Pairs

Int J Mol Sci. 2021 Jun 20;22(12):6598. doi: 10.3390/ijms22126598.

Abstract

Backgroud: The prediction of drug-target interactions (DTIs) is of great significance in drug development. It is time-consuming and expensive in traditional experimental methods. Machine learning can reduce the cost of prediction and is limited by the characteristics of imbalanced datasets and problems of essential feature selection.

Methods: The prediction method based on the Ensemble model of Multiple Feature Pairs (Ensemble-MFP) is introduced. Firstly, three negative sets are generated according to the Euclidean distance of three feature pairs. Then, the negative samples of the validation set/test set are randomly selected from the union set of the three negative sets in the validation set/test set. At the same time, the ensemble model with weight is optimized and applied to the test set.

Results: The area under the receiver operating characteristic curve (area under ROC, AUC) in three out of four sub-datasets in gold standard datasets was more than 94.0% in the prediction of new drugs. The effectiveness of the proposed method is also shown with the comparison of state-of-the-art methods and demonstration of predicted drug-target pairs.

Conclusion: The Ensemble-MFP can weigh the existing feature pairs and has a good prediction effect for general prediction on new drugs.

Keywords: drug–target interactions; ensemble model of Multiple Feature Pairs (Ensemble-MFP); model weight sum; support vector machines.

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Drug Development / methods*
  • Drug Development / standards
  • Machine Learning
  • Models, Theoretical*
  • Reproducibility of Results
  • Support Vector Machine