Engineering highly active nuclease enzymes with machine learning and high-throughput screening.
Journal:
Cell systems
PMID:
40081373
Abstract
Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology, but optimization is often hindered by a rugged fitness landscape and costly experiments. In this work, we present TeleProt, a machine learning (ML) framework that blends evolutionary and experimental data to design diverse protein libraries, and employ it to improve the catalytic activity of a nuclease enzyme that degrades biofilms that accumulate on chronic wounds. After multiple rounds of high-throughput experiments, TeleProt found a significantly better top-performing enzyme than directed evolution (DE), had a better hit rate at finding diverse, high-activity variants, and was even able to design a high-performance initial library using no prior experimental data. We have released a dataset of 55,000 nuclease variants, one of the most extensive genotype-phenotype enzyme activity landscapes to date, to drive further progress in ML-guided design. A record of this paper's transparent peer review process is included in the supplemental information.