Sparse group selection and analysis of function-related residue for protein-state recognition.

Journal: Journal of computational chemistry
Published Date:

Abstract

Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio-macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems.

Authors

  • Fangyun Bai
    Department of Management Science and Engineering, Tongji University, Shanghai, China.
  • Kin Ming Puk
    Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, USA.
  • Jin Liu
    School of Computer Science and Engineering, Central South University, Changsha, China.
  • Hongyu Zhou
    Institute for AI in Medicine and Faculty of Medicine, Macau University of Science and Technology, Macau, China; National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China.
  • Peng Tao
    Department of Chemistry, Center for Drug Discovery, Design, and Delivery (CD4), Center for Scientific Computation, Southern Methodist University, Dallas, Texas, 75275.
  • Wenyong Zhou
    Department of Management Science and Engineering, Tongji University, Shanghai, China.
  • Shouyi Wang
    Department of Industrial, Manufacturing, and Systems Engineering, The University of Texas at Arlington, 500 West First St., Arlington, TX, 76019, USA.