Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting

Anal Biochem. 2020 Jan 15:589:113507. doi: 10.1016/j.ab.2019.113507. Epub 2019 Nov 15.

Abstract

Accurate identification of drug-target interaction (DTI) is a crucial and challenging task in the drug discovery process, having enormous benefit to the patients and pharmaceutical company. The traditional wet-lab experiments of DTI is expensive, time-consuming, and labor-intensive. Therefore, many computational techniques have been established for this purpose; although a huge number of interactions are still undiscovered. Here, we present pdti-EssB, a new computational model for identification of DTI using protein sequence and drug molecular structure. More specifically, each drug molecule is transformed as the molecular substructure fingerprint. For a protein sequence, different descriptors are utilized to represent its evolutionary, sequence, and structural information. Besides, our proposed method uses data balancing techniques to handle the imbalance problem and applies a novel feature eliminator to extract the best optimal features for accurate prediction. In this paper, four classes of DTI benchmark datasets are used to construct a predictive model with XGBoost. Here, the auROC is utilized as an evaluation metric to compare the performance of pdti-EssB method with recent methods, applying five-fold cross-validation. Finally, the experimental results indicate that our proposed method is able to outperform other approaches in predicting DTI, and introduces new drug-target interaction samples based on prediction probability scores. pdti-EssB webserver is available online at http://pdtiessb-uestc.com/.

Keywords: Data imbalance; Drug-target interaction; Feature extraction; Feature selection; Molecular substructure fingerprint; XGBoost classifier.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods
  • Computer Simulation*
  • Databases, Protein
  • Datasets as Topic
  • Drug Discovery / methods*
  • Models, Molecular*
  • Pharmaceutical Preparations / metabolism*
  • Protein Binding
  • Protein Domains
  • Proteins / metabolism*
  • Structure-Activity Relationship

Substances

  • Pharmaceutical Preparations
  • Proteins