Accelerating protein engineering with fitness landscape modeling and reinforcement learning
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Protein engineering holds significant promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited lab capacity constrains the discovery of optimal sequences. To address this, we present the µProtein framework, which accelerates protein engineering by combining µFormer, a deep learning model for accurate mutational effect prediction, with µSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using µFormer as an oracle. µProtein leverages single mutation data to predict optimal sequences with complex, multi-amino acid mutations through its modeling of epistatic interactions and a multi-step search strategy. Except from state-of-the-art performance on benchmark datasets, µProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing the highest known activity level, in wet-lab, trained solely on single mutation data. These results demonstrate µProtein’s capability to discover impactful mutations across vast protein sequence space, offering a robust, efficient approach for protein optimization.