Machine learning to navigate fitness landscapes for protein engineering.

Journal: Current opinion in biotechnology
Published Date:

Abstract

Machine learning (ML) is revolutionizing our ability to understand and predict the complex relationships between protein sequence, structure, and function. Predictive sequence-function models are enabling protein engineers to efficiently search the sequence space for useful proteins with broad applications in biotechnology. In this review, we highlight the recent advances in applying ML to protein engineering. We discuss supervised learning methods that infer the sequence-function mapping from experimental data and new sequence representation strategies for data-efficient modeling. We then describe the various ways in which ML can be incorporated into protein engineering workflows, including purely in silico searches, ML-assisted directed evolution, and generative models that can learn the underlying distribution of the protein function in a sequence space. ML-driven protein engineering will become increasingly powerful with continued advances in high-throughput data generation, data science, and deep learning.

Authors

  • Chase R Freschlin
    Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
  • Sarah A Fahlberg
    Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA.
  • Philip A Romero
    Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA; Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA. Electronic address: promero2@wisc.edu.