Machine-learning-guided directed evolution for protein engineering.

Journal: Nature methods
Published Date:

Abstract

Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence-function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.

Authors

  • Kevin K Yang
    Division of Chemistry and Chemical Engineering; California Institute of Technology; Pasadena, California; United States of America.
  • Zachary Wu
    Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
  • Frances H Arnold
    Division of Biology and Biological Engineering; California Institute of Technology; Pasadena, California; United States of America.