A Bayesian statistical approach of improving knowledge-based scoring functions for protein-ligand interactions

J Comput Chem. 2014 May 5;35(12):932-43. doi: 10.1002/jcc.23579. Epub 2014 Mar 13.

Abstract

Knowledge-based scoring functions are widely used for assessing putative complexes in protein-ligand and protein-protein docking and for structure prediction. Even with large training sets, knowledge-based scoring functions face the inevitable problem of sparse data. Here, we have developed a novel approach for handling the sparse data problem that is based on estimating the inaccuracies in knowledge-based scoring functions. This inaccuracy estimation is used to automatically weight the knowledge-based scoring function with an alternative, force-field-based potential (FFP) that does not rely on training data and can, therefore, provide an improved approximation of the interactions between rare chemical groups. The current version of STScore, a protein-ligand scoring function using our method, achieves a binding mode prediction success rate of 91% on the set of 100 complexes by Wang et al., and a binding affinity correlation of 0.514 with the experimentally determined affinities in PDBbind. The method presented here may be used with other FFPs and other knowledge-based scoring functions and can also be applied to protein-protein docking and protein structure prediction.

Keywords: knowledge-based scoring function; ligand interactions; molecular docking; protein; sparse data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • Ligands
  • Models, Molecular
  • Proteins / chemistry*

Substances

  • Ligands
  • Proteins