Machine learning reveals novel compound for the improved production of chitooligosaccharides in Escherichia coli.

Journal: New biotechnology
PMID:

Abstract

In order to improve predictability of outcome and reduce costly rounds of trial-and-error, machine learning models have been of increasing importance in the field of synthetic biology. Besides applications in predicting genome annotation, process parameters and transcription initiation frequency, such models have also been of help for pathway optimization. The latter is a common strategy in metabolic engineering and improves the production of a desirable compound by optimizing enzyme expression levels of the production pathway. However, engineering steps might not lead to sufficient improvement, and bottlenecks may remain hidden among the hundreds of metabolic reactions occurring in a living cell, especially if the production pathway is highly interconnected with other parts of the cell's metabolism. Here, we use the synthesis of chitooligosaccharides (COS) to show that the production from such complex pathways can be improved by using machine learning models and feature importance analysis to find new compounds with an impact on COS production. We screened Escherichia coli libraries of engineered transcription regulators with an expected broad range of metabolic diversity and trained several machine learning models to predict COS production titers. Subsequent feature analysis led to the finding of iron, whose addition we could show improved COS production in vivo up to two-fold. Additionally, the analysis revealed important clues for future engineering steps.

Authors

  • Friederike Mey
    Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium.
  • Gaetan De Waele
    Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.
  • Wouter Demeester
    Center for Synthetic Biology, Department of Biotechnology, University of Ghent, Belgium.
  • Chiara Guidi
    Center for Synthetic Biology, Department of Biotechnology, University of Ghent, Belgium.
  • Dries Duchi
    Center for Synthetic Biology, Department of Biotechnology, University of Ghent, Belgium.
  • Tomek Diederen
    Institute for Molecular Systems Biology, Department of Biology, ETH Zurich, Switzerland.
  • Hanne Kochuyt
    Center for Synthetic Biology, Department of Biotechnology, University of Ghent, Belgium.
  • Kirsten Van Huffel
    Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium.
  • Nicola Zamboni
    Swiss Multi-Omics Center, ETH Zürich, Zürich, Switzerland.
  • Willem Waegeman
    KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium.
  • Marjan De Mey
    Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, 9000 Ghent, Belgium. Electronic address: marjan.demey@ugent.be.