Scoring-Assisted Generative Exploration for Proteins (SAGE-Prot): A Framework for Multi-Objective Protein Optimization via Iterative Sequence Generation and Evaluation
Journal:
arXiv
Published Date:
May 2, 2025
Abstract
Proteins play essential roles in nature, from catalyzing biochemical
reactions to binding specific targets. Advances in protein engineering have the
potential to revolutionize biotechnology and healthcare by designing proteins
with tailored properties. Machine learning and generative models have
transformed protein design by enabling the exploration of vast
sequence-function landscapes. Here, we introduce Scoring-Assisted Generative
Exploration for Proteins (SAGE-Prot), a framework that iteratively combines
autoregressive protein generation with quantitative structure-property
relationship models for fine-tuned optimization. By integrating diverse protein
descriptors, SAGE-Prot enhances key properties, including binding affinity,
thermal stability, enzymatic activity, and solubility. We demonstrate its
effectiveness by optimizing GB1 for binding affinity and thermal stability and
TEM-1 for enzymatic activity and solubility. Leveraging curriculum learning,
SAGE-Prot adapts rapidly to increasingly complex design objectives, building on
past successes. Experimental validation demonstrated that SAGE-Prot-generated
proteins substantially outperformed their wild-type counterparts, achieving up
to a 17-fold increase in beta-lactamase activity, underscoring SAGE-Prot's
potential to tackle critical challenges in protein engineering. As generative
models continue to evolve, approaches like SAGE-Prot will be indispensable for
advancing rational protein design.