Bacterial protein function prediction via multimodal deep learning

Journal: bioRxiv
Published Date:

Abstract

Bacterial proteins are specialized with extensive functional diversity for survival in diverse and stressful environments. A significant portion of these proteins remains functionally uncharacterized, limiting our understanding of bacterial survival mechanisms. Hence, we developed Deep Expression STructure (DeepEST), a multimodal deep learning framework designed to accurately predict protein function in bacteria by assigning Gene Ontology (GO) terms. DeepEST comprises two modules: a multi-layer perceptron that takes gene expression and gene location as input features, and a protein structure-based predictor. Within DeepEST, we integrated these modules through a learnable weighted linear combination and introduced a novel masked loss function to fine-tune the structure-based predictor for bacterial species. These modeling choices are particularly well suited for bacteria due to the spatial organization of their circular genomes. Functionally related genes frequently co-localize and are co-transcribed within operons, allowing transcription dynamics to serve as crucial, condition-dependent regulatory signals. We show that DeepEST outperforms existing protein function prediction methods on a 25-species benchmark, relying solely on amino acid sequence or protein structure. Moreover, DeepEST predicts GO terms for unclassified hypothetical proteins across 25 human bacterial pathogens, facilitating the design of experimental setups for characterization studies. By combining expression, localization, and structure information in a unified deep learning framework, DeepEST bridges organism-specific data integration and structure-based transfer learning, providing a method tailored for bacterial protein function prediction in settings with structural and multi-condition expression data.

Authors

  • Muzio
  • G.; Adamer
  • M.; Fernandez
  • L.; Miklautz
  • L.; Borgwardt
  • K.; Avican
  • K.

Categories