Learning protein fitness landscapes with deep mutational scanning data from multiple sources.

Journal: Cell systems
Published Date:

Abstract

One of the key points of machine learning-assisted directed evolution (MLDE) is the accurate learning of the fitness landscape, a conceptual mapping from sequence variants to the desired function. Here, we describe a multi-protein training scheme that leverages the existing deep mutational scanning data from diverse proteins to aid in understanding the fitness landscape of a new protein. Proof-of-concept trials are designed to validate this training scheme in three aspects: random and positional extrapolation for single-variant effects, zero-shot fitness predictions for new proteins, and extrapolation for higher-order variant effects from single-variant effects. Moreover, our study identified previously overlooked strong baselines, and their unexpectedly good performance brings our attention to the pitfalls of MLDE. Overall, these results may improve our understanding of the association between different protein fitness profiles and shed light on developing better machine learning-assisted approaches to the directed evolution of proteins. A record of this paper's transparent peer review process is included in the supplemental information.

Authors

  • Lin Chen
    College of Sports, Nanjing Tech University, Nanjing, China.
  • Zehong Zhang
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China; University of Chinese Academy of Sciences, No.19A Yuquan Road, Beijing 100049, China.
  • Zhenghao Li
    School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
  • Rui Li
    Department of Oncology, Xiyuan Hospital, China Academy of Chinese Medical Science, Beijing, China.
  • Ruifeng Huo
    School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China.
  • Lifan Chen
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China.
  • Dingyan Wang
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
  • Xiaomin Luo
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
  • Kaixian Chen
    Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
  • Cangsong Liao
    University of Chinese Academy of Sciences, Beijing 100049, China; Chemical Biology Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Science, Shanghai 201203, China. Electronic address: csliao@simm.ac.cn.
  • Mingyue Zheng
    School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China.