ProStab: Prediction of protein stability change upon mutations by protein language and inverse folding models
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Predicting protein stability change upon mutation is critical for protein engineering, yet remains limited by the modeling assumptions of physics-based methods and the generalization bottlenecks of data-driven approaches. We present ProStab, a deep learning framework that integrates sequence- and structure-based information, including the mutation-aware sequence embeddings from protein language models and the geometric features extracted via an inverse folding model. Trained on the large-scale Megascale dataset, ProStab demonstrates strong performance across diverse test sets and robust generalization across distribution shifts between the training and test sets. In head-to-head comparisons, ProStab outperforms all state-of-the-art methods with consistently higher Spearman correlation and precision. To evaluate its practical utility, we experimentally validated ProStab-predicted mutations on the model enzyme transaminase. Among the 16 successfully expressed variants, 4 exhibited improved thermal stability. Remarkably, the 1st top-ranked predicted mutation yielded the highest observed enzymatic activity, retaining three-fold that of the wild type after 10 minutes at 40 °C. To facilitate broader application, a publicly accessible web server has been developed. We envisage that ProStab provides a scalable and accurate platform for intelligent protein stability design.