Computational scoring and experimental evaluation of enzymes generated by neural networks.

Journal: Nature biotechnology
Published Date:

Abstract

In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50-150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.

Authors

  • Sean R Johnson
    New England Biolabs, Ipswich, MA, USA.
  • Xiaozhi Fu
    Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden.
  • Sandra Viknander
    Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden.
  • Clara Goldin
    Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden.
  • Sarah Monaco
  • Aleksej Zelezniak
    Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE-412 96, Gothenburg, Sweden. aleksej.zelezniak@chalmers.se.
  • Kevin K Yang
    Division of Chemistry and Chemical Engineering; California Institute of Technology; Pasadena, California; United States of America.