Ultrahigh throughput screening to train generative protein models for engineering specificity into unspecific peroxygenases

Journal: bioRxiv
Published Date:

Abstract

Enzyme engineering is central to developing biocatalysts with improved activity and specificity, yet traditional approaches are often limited by the scale of experimental screening. Here we combine ultrahigh-throughput microfluidic droplet screening with generative machine learning to engineer specificity into an unspecific peroxygenase (UPO) from Aspergillus brasiliensis (AbrUPO). A library of 5e6 variants was expressed in Komagataella phaffii (Pichia pastoris) and screened for increased production of styrene oxide and reduced production of phenylacetaldehyde and phenylpropanol. From an ultrahigh throughput screening dataset of >30,000 unique sequences paired with function data. These data were used both for direct selection of enriched variants and to train a task-specific generative long short-term memory (LSTM) model using the Variational Search Distributions (VSD) framework. We compared variants selected by rank aggregation from the screening data (R series) with novel sequences generated by the refined generative model (G series). Experimental assays revealed that multiple G variants outperformed both wildtype AbrUPO and R variants, in terms of styrene oxide production and specificity. Correlation analyses further showed that the task-specific generative model outperformed protein models pretrained on large publicly available datasets in predicting experimental outcomes. Our results demonstrate that coupling ultrahigh throughput screening with generative protein models enables efficient discovery of improved enzyme variants beyond those identified by direct screening alone, providing a scalable strategy for task-specific enzyme engineering.

Authors

  • Pradeep M. Nair; Daniel M. Steinberg; Tiago Resende; Angelica Faith Suarez; Helen Power; Cheng Soon Ong; Vijay Sairam; Shelly Hsiao-Ying Cheng; Lorivie Fragata; Alessandro Serafini; Victor Oh; Say Hwa Tan; Robert E. Speight; Akbar K. Vahidi