Unified rational protein engineering with sequence-based deep representation learning.

Journal: Nature methods
Published Date:

Abstract

Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.

Authors

  • Ethan C Alley
    Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
  • Grigory Khimulya
  • Surojit Biswas
    Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
  • Mohammed AlQuraishi
    Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA. Electronic address: alquraishi@hms.harvard.edu.
  • George M Church
    Wyss Institute for Biologically Inspired Engineering , Boston, Massachusetts 02115, United States.