Structure of the Enterobacter pan-genome is revealed using machine learning.
Journal:
Microbiology spectrum
Published Date:
Dec 15, 2025
Abstract
The growing availability of publicly accessible Enterobacter genomes offers an opportunity to reveal the structure of its pangenome, uncovering the catalog of genes across the genus and their distribution across the different species and subspecies of the genus. In this study, we analyze 777 high-quality complete Enterobacter genomes using a pangenome matrix. The accessory genome, consisting of the genes found in many, but not all strains, was decomposed using non-negative matrix factorization (NMF) to identify groups of genes, called Phylons, that are found to be present across the subgroups of the genomes analyzed. The Phylons are representative of major modes of inheritance, both lineage-associated and horizontal, found across the pangenome. Using NMF, we defined 31 Phylons, representative of 21 lineage-associated gene sets, and 10 Phylons containing genes associated with mobile genetic elements. Six mobile Phylons were extrachromosomal, representing plasmids, and four associated with chromosomal DNA. These 31 Phylons define the structure of the Enterobacter pangenome. This structure is consistent with the classification of an additional 2,291 fragmented genome sequences. This structure enables the pangenome-wide mapping of genetic traits, such as motility genes, biosynthetic gene clusters, antimicrobial resistance genes, and virulence factors. NMF thus enabled phylogenetic and functional classification of genomes based on the pangenome-scale assessment of a genome's gene portfolio. A robust classification of Enterobacter spp. enhances the understanding of the evolution of this clinically significant pathogen.IMPORTANCEEnterobacter spp. represent a vital member of the Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, Enterobacter species, and Escherichia coli pathogens relevant for their nosocomial pathogenicity and antimicrobial resistance. Understanding the genomic diversity of the genus is vital for further study of its evolution and resistance potential. We constructed a pangenome of 777 Enterobacter complete genomes. Machine learning techniques were used to mathematically define major subpopulations of Enterobacter based on their accessory gene content, which for the first time defined dominant modes of lineage-associated and horizontal inheritance. This analysis provides insights into the distribution of traits related to antimicrobial resistance, biosynthetic gene clusters, and virulence factors. This study provides robust classification of Enterobacter isolates identifying differential genetic traits across the species and subspecies of the genus, overcoming some of the ambiguity in its taxonomy.
Authors
Keywords
No keywords available for this article.