Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning.
Journal:
ACS synthetic biology
PMID:
34927418
Abstract
Promoters are one of the most critical regulatory elements controlling metabolic pathways. However, the fast and accurate prediction of promoter strength remains challenging, leading to time- and labor-consuming promoter construction and characterization processes. This dilemma is caused by the lack of a big promoter library that has gradient strengths, broad dynamic ranges, and clear sequence profiles that can be used to train an artificial intelligence model of promoter strength prediction. To overcome this challenge, we constructed and characterized a mutant library of Trc promoters () using 83 rounds of mutation-construction-screening-characterization engineering cycles. After excluding invalid mutation sites, we established a synthetic promoter library that consisted of 3665 different variants, displaying an intensity range of more than two orders of magnitude. The strongest variant was ∼69-fold stronger than the original and 1.52-fold stronger than a 1 mM isopropyl-β-d-thiogalactoside-driven promoter, with an ∼454-fold difference between the strongest and weakest expression levels. Using this synthetic promoter library, different machine learning models were built and optimized to explore the relationships between promoter sequences and transcriptional strength. Finally, our XgBoost model exhibited optimal performance, and we utilized this approach to precisely predict the strength of artificially designed promoter sequences ( = 0.88, mean absolute error = 0.15, and Pearson correlation coefficient = 0.94). Our work provides a powerful platform that enables the predictable tuning of promoters to achieve optimal transcriptional strength.