Deep learning for mining protein data

Qiang Shi; Weiya Chen; Siqi Huang; Yan Wang; Zhidong Xue

doi:10.1093/bib/bbz156

Deep learning for mining protein data

Brief Bioinform. 2021 Jan 18;22(1):194-218. doi: 10.1093/bib/bbz156.

Authors

Qiang Shi¹, Weiya Chen², Siqi Huang³, Yan Wang⁴, Zhidong Xue⁵

Affiliations

¹ School of Software Engineering, Huazhong University of Science and Technology. His main interests cover machine learning especially deep learning, protein data analysis, and big data mining.
² School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, virtual reality, and data visualization.
³ Software Engineering at Huazhong University of science and technology, focusing on Machine learning and data mining.
⁴ School of life, University of Science & Technology; her main interests cover protein structure and function prediction and big data mining.
⁵ School of Software Engineering, Huazhong University of Science & Technology, Wuhan, China. His research interests cover bioinformatics, machine learning, and image processing.

PMID: 31867611
DOI: 10.1093/bib/bbz156

Abstract

The recent emergence of deep learning to characterize complex patterns of protein big data reveals its potential to address the classic challenges in the field of protein data mining. Much research has revealed the promise of deep learning as a powerful tool to transform protein big data into valuable knowledge, leading to scientific discoveries and practical solutions. In this review, we summarize recent publications on deep learning predictive approaches in the field of mining protein data. The application architectures of these methods include multilayer perceptrons, stacked autoencoders, deep belief networks, two- or three-dimensional convolutional neural networks, recurrent neural networks, graph neural networks, and complex neural networks and are described from five perspectives: residue-level prediction, sequence-level prediction, three-dimensional structural analysis, interaction prediction, and mass spectrometry data mining. The advantages and deficiencies of these architectures are presented in relation to various tasks in protein data mining. Additionally, some practical issues and their future directions are discussed, such as robust deep learning for protein noisy data, architecture optimization for specific tasks, efficient deep learning for limited protein data, multimodal deep learning for heterogeneous protein data, and interpretable deep learning for protein understanding. This review provides comprehensive perspectives on general deep learning techniques for protein data analysis.

Keywords: 3D-structure prediction; deep learning; interaction prediction; protein big data; protein mass spectrometry; residue-level prediction; sequence-level prediction.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Animals
Data Mining / methods*
Databases, Protein
Deep Learning*
Humans
Sequence Analysis, Protein / methods*