Machine-learning-guided identification of protein secondary structures using spectral and structural descriptors.

Journal: Biomaterials science
Published Date:

Abstract

Interrogation of the secondary structures of proteins is essential for designing and engineering more effective and safer protein-based biomaterials and other classes of theranostic materials. Protein secondary structures are commonly assessed using circular dichroism spectroscopy, followed by relevant downstream analysis using specialized software. As many proteins have complex secondary structures beyond the typical α-helix and β-sheet configurations, and the derived secondary structural contents are significantly influenced by the selection of software, estimations acquired through conventional methods may be less reliable. Herein, we propose the implementation of a machine-learning-based approach to improve the accuracy and reliability of the classification of protein secondary structures. Specifically, we leverage supervised machine learning to analyze the circular dichroism spectra and relevant attributes of 112 proteins to predict their secondary structures. Based on a range of spectral, structural, and molecular features, we systematically evaluate the predictive performance of numerous supervised classifiers and identify optimal combinations of algorithms with descriptors to achieve highly accurate and precise estimations of protein secondary structures. We anticipate that this work will offer a deeper insight into the development of machine-learning-based approaches to streamline the delineation of protein structures for different biological and biomedical applications.

Authors

  • Ziqi Wang
    The Center for Ion Beam Bioengineering & Green Agriculture, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China.
  • Kenry
    Department of Chemical and Biomolecular Engineering, National University of Singapore , 4 Engineering Drive 4, Singapore 117585.