Improved prediction of protein-protein binding sites using a support vector machines approach

Bioinformatics. 2005 Apr 15;21(8):1487-94. doi: 10.1093/bioinformatics/bti242. Epub 2004 Dec 21.

Abstract

Motivation: Structural genomics projects are beginning to produce protein structures with unknown function, therefore, accurate, automated predictors of protein function are required if all these structures are to be properly annotated in reasonable time. Identifying the interface between two interacting proteins provides important clues to the function of a protein and can reduce the search space required by docking algorithms to predict the structures of complexes.

Results: We have combined a support vector machine (SVM) approach with surface patch analysis to predict protein-protein binding sites. Using a leave-one-out cross-validation procedure, we were able to successfully predict the location of the binding site on 76% of our dataset made up of proteins with both transient and obligate interfaces. With heterogeneous cross-validation, where we trained the SVM on transient complexes to predict on obligate complexes (and vice versa), we still achieved comparable success rates to the leave-one-out cross-validation suggesting that sufficient properties are shared between transient and obligate interfaces.

Availability: A web application based on the method can be found at http://www.bioinformatics.leeds.ac.uk/ppi_pred. The dataset of 180 proteins used in this study is also available via the same web site.

Contact: westhead@bmb.leeds.ac.uk

Supplementary information: http://www.bioinformatics.leeds.ac.uk/ppi-pred/supp-material.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Binding Sites
  • Computer Simulation
  • Models, Chemical*
  • Models, Molecular*
  • Pattern Recognition, Automated / methods*
  • Protein Binding
  • Protein Conformation
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*
  • Software
  • Structure-Activity Relationship

Substances

  • Proteins