Bacterial proteome foundation model enhances functional prediction from enzymes to ecological interactions

Journal: bioRxiv
Published Date:

Abstract

Bacteria play fundamental roles in ecosystems, human health, and biotechnology. Although bacterial genome sequencing data have accumulated rapidly over the past decade, the metabolic and ecological functions of most sequenced bacteria remain poorly understood, apart from a few well-studied taxa and traits. Establishing a general framework that comprehensively captures the relationship between bacterial genomes and the diverse biological functions they encode remains a major challenge, as this task requires embedding individual genes within their broader genomic context and modeling the combined effects of gene interactions across complex biological pathways and networks. The difficulty is further compounded by the limited functional annotations available for most bacterial genomes. Here, we introduce BacPT, a bacterial proteome foundation model trained on tens of thousands of complete genomes spanning diverse taxa. BacPT captures both local and genome-wide information, enabling the generation of contextualized gene embeddings and functionally rich representations at the whole organism level. We demonstrate the utility of BacPT across diverse prediction tasks spanning multiple biological scales. BacPT embeddings improve the prediction of enzyme activities, biosynthetic gene clusters, metabolic traits, and ecological interaction outcomes. Our results highlight that unsupervised deep learning applied at the scale of entire proteomes provides a powerful approach for characterizing gene interactions and mapping functional landscapes for bacteria.

Authors

  • Sethi
  • P.; Pereira
  • L. S.; Zhou
  • J.

Categories