Designing minimal E. coli genomes using variational autoencoders
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Designing minimal bacterial genomes remains a key challenge in synthetic biology. There is currently a lack of efficient tools for the rapid generation of streamlined bacterial genomes, limiting research in this area. Here, using a pangenome dataset for Escherichia coli, we show that variational autoencoders with modified loss functions can successfully create minimised genomes retaining the essential genes identified in the literature. We then sampled new genomes from our fitted model and performed computational validation using an E. coli whole-cell model. We found 6 out of 100 of the sampled genomes were viable in the computer model. These underwent a minimization routine starting from the MG1655 genome giving rise to six new minimal genomes with around a 40 % reduction in size. This study proposes a rapid, machine learning-based approach for bacterial sequence generation, that could accelerate the genomic design process.