Fast algorithm for population-based protein structural model analysis

Jingfen Zhang; Dong Xu

doi:10.1002/pmic.201200334

Fast algorithm for population-based protein structural model analysis

Proteomics. 2013 Jan;13(2):221-9. doi: 10.1002/pmic.201200334. Epub 2013 Jan 3.

Authors

Jingfen Zhang¹, Dong Xu

Affiliation

¹ Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65201, USA.

Abstract

De novo protein structure prediction often generates a large population of candidates (models), and then selects near-native models through clustering. Existing structural model clustering methods are time consuming due to pairwise distance calculation between models. In this paper, we present a novel method for fast model clustering without losing the clustering accuracy. Instead of the commonly used pairwise root mean square deviation and TM-score values, we propose two new distance measures, Dscore1 and Dscore2, based on the comparison of the protein distance matrices for describing the difference and the similarity among models, respectively. The analysis indicates that both the correlation between Dscore1 and root mean square deviation and the correlation between Dscore2 and TM-score are high. Compared to the existing methods with calculation time quadratic to the number of models, our Dscore1-based clustering achieves a linearly time complexity while obtaining almost the same accuracy for near-native model selection. By using Dscore2 to select representatives of clusters, we can further improve the quality of the representatives with little increase in computing time. In addition, for large size (~500 k) models, we can give a fast data visualization based on the Dscore distribution in seconds to minutes. Our method has been implemented in a package named MUFOLD-CL, available at http://mufold.org/clustering.php.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms*
Cluster Analysis
Computational Biology / methods*
Databases, Protein
Models, Chemical*
Models, Molecular
Protein Conformation
Proteins / chemistry*

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding