Machine Learning Techniques to Infer Protein Structure and Function from Sequences: A Comprehensive Review.

Journal: Methods in molecular biology (Clifton, N.J.)
PMID:

Abstract

The elucidation of protein structure and function plays a pivotal role in understanding biological processes and facilitating drug discovery. With the exponential growth of protein sequence data, machine learning techniques have emerged as powerful tools for predicting protein characteristics from sequences alone. This review provides a comprehensive overview of the importance and application of machine learning in inferring protein structure and function. We discuss various machine learning approaches, primarily focusing on convolutional neural networks and natural language processing, and their utilization in predicting protein secondary and tertiary structures, residue-residue contacts, protein function, and subcellular localization. Furthermore, we highlight the challenges associated with using machine learning techniques in this context, such as the availability of high-quality training datasets and the interpretability of models. We also delve into the latest progress in the field concerning the advancements made in the development of intricate deep learning architectures. Overall, this review underscores the significance of machine learning in advancing our understanding of protein structure and function, and its potential to revolutionize drug discovery and personalized medicine.

Authors

  • Gopal Srivastava
    Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
  • Mengmeng Liu
    Department of First Hospital, Jilin University, Changchun, China.
  • Xialong Ni
    Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA.
  • Limeng Pu
    Division of Electrical & Computer Engineering, Louisiana State University, Baton Rouge, LA, 70803, USA.
  • Michal Brylinski
    Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, United States.