Protein design and variant prediction using autoregressive generative models.

Journal: Nature communications
Published Date:

Abstract

The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 10-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the alignment-free autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.

Authors

  • Jung-Eun Shin
    Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
  • Adam J Riesselman
    Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
  • Aaron W Kollasch
    Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02139, USA.
  • Conor McMahon
    Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA.
  • Elana Simon
    Harvard College, Cambridge, MA, USA.
  • Chris Sander
    Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, 10065 NY; and arne@bioinfo.se debbie@hms.harvard.edu cccsander@gmail.com.
  • Aashish Manglik
    Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA, USA.
  • Andrew C Kruse
    Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA. Andrew_Kruse@hms.harvard.edu.
  • Debora S Marks
    Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02139, USA. Electronic address: debbie@hms.harvard.edu.