On synergy between ultrahigh throughput screening and machine learning in biocatalyst engineering.

Journal: Faraday discussions
PMID:

Abstract

Protein design and directed evolution have separately contributed enormously to protein engineering. Without being mutually exclusive, the former relies on computation from first principles, while the latter is a combinatorial approach based on chance. Advances in ultrahigh throughput (uHT) screening, next generation sequencing and machine learning may create alternative routes to engineered proteins, where functional information linked to specific sequences is interpreted and extrapolated . In particular, the miniaturisation of functional tests in water-in-oil emulsion droplets with picoliter volumes and their rapid generation and analysis (>1 kHz) allows screening of >10-membered libraries in a day. Subsequently, decoding the selected clones by short or long-read sequencing methods leads to large sequence-function datasets that may allow extrapolation from experimental directed evolution to further improved mutants beyond the observed hits. In this work, we explore experimental strategies for how to draw up 'fitness landscapes' in sequence space with uHT droplet microfluidics, review the current state of AI/ML in enzyme engineering and discuss how uHT datasets may be combined with AI/ML to make meaningful predictions and accelerate biocatalyst engineering.

Authors

  • Maximilian Gantz
    Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK.
  • Simon V Mathis
    Department of Computer Science, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK.
  • Friederike E H Nintzel
    Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK.
  • Pietro Lió
    Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, UK.
  • Florian Hollfelder
    Department of Biochemistry, University of Cambridge, 80 Tennis Court, Cambridge, CB2 1QW, UK.