Accelerating protein engineering with fitness landscape modeling and reinforcement learning

Journal: bioRxiv
Published Date:

Abstract

Protein engineering holds significant promise for designing proteins with customized functions, yet the vast landscape of potential mutations versus limited lab capacity constrains the discovery of optimal sequences. To address this, we present the µProtein framework, which accelerates protein engineering by combining µFormer, a deep learning model for accurate mutational effect prediction, with µSearch, a reinforcement learning algorithm designed to efficiently navigate the protein fitness landscape using µFormer as an oracle. µProtein leverages single mutation data to predict optimal sequences with complex, multi-amino acid mutations through its modeling of epistatic interactions and a multi-step search strategy. Except from state-of-the-art performance on benchmark datasets, µProtein identified high-gain-of-function multi-point mutants for the enzyme β-lactamase, surpassing the highest known activity level, in wet-lab, trained solely on single mutation data. These results demonstrate µProtein’s capability to discover impactful mutations across vast protein sequence space, offering a robust, efficient approach for protein optimization.

Authors

  • Haoran Sun; Liang He; Pan Deng; Guoqing Liu; Zhiyu Zhao; Yuliang Jiang; Chuan Cao; Fusong Ju; Lijun Wu; Haiguang Liu; Tao Qin; Tie-Yan Liu