ActiMut-XGB: Predicting thermodynamic stability of point mutations for CALB with protein language model.

Journal: International journal of biological macromolecules
Published Date:

Abstract

Predicting the functional impact of single-point mutations on protein residual activity, especially after high-temperature incubation, is critical in protein engineering. We present an innovative machine learning model based on eXtreme Gradient Boosting that leverages protein sequence data to predict thermostability, circumventing the need for three-dimensional structural information. Our model integrates features from the ESM2 language model, physicochemical properties, evolutionary features, and positional features. A key advancement is the use of transfer learning with thermal stability data from various proteins, which enhances prediction accuracy and generalizability. To fine-tune and validate the model, we used experimental data from Candida antarctica lipase B single-point mutants, a widely studied enzyme in biocatalysis and industrial applications. Despite potential limitations of Gibbs free energy values in capturing all factors influencing thermostability, our model represents a significant improvement over traditional approaches, providing valuable insights for protein engineering, enzyme optimization, and therapeutic protein development.

Authors

  • Yuxin Jiang
    Department of Ultrasound, Chinese Academy of Medical Sciences and Peking Union Medical College Hospital, Beijing, China.
  • Shuai Huang
    Department of Industrial and Systems Engineering, University of Washington, Seattle, WA 98195 USA.
  • Hai-Feng Chen
    State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai200240, China.