Multi-class protein fold recognition using support vector machines and neural networks

C H Ding; I Dubchak

doi:10.1093/bioinformatics/17.4.349

Multi-class protein fold recognition using support vector machines and neural networks

Bioinformatics. 2001 Apr;17(4):349-58. doi: 10.1093/bioinformatics/17.4.349.

Authors

C H Ding¹, I Dubchak

Affiliation

¹ NERSC Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA.

PMID: 11301304
DOI: 10.1093/bioinformatics/17.4.349

Abstract

Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system.

Results: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Discriminant Analysis
Neural Networks, Computer*
Protein Folding*
Proteins / chemistry*
Proteins / classification

Substances

Proteins