DeepGOMeta for functional insights into microbial communities using deep learning-based protein function prediction.

Journal: Scientific reports
PMID:

Abstract

Analyzing microbial samples remains computationally challenging due to their diversity and complexity. The lack of robust de novo protein function prediction methods exacerbates the difficulty in deriving functional insights from these samples. Traditional prediction methods, dependent on homology and sequence similarity, often fail to predict functions for novel proteins and proteins without known homologs. Moreover, most of these methods have been trained on largely eukaryotic data, and have not been evaluated on or applied to microbial datasets. This research introduces DeepGOMeta, a deep learning model designed for protein function prediction as Gene Ontology (GO) terms, trained on a dataset relevant to microbes. The model is applied to diverse microbial datasets to demonstrate its use for gaining biological insights. Data and code are available at https://github.com/bio-ontology-research-group/deepgometa.

Authors

  • Rund Tawfiq
    KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
  • Kexin Niu
    KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
  • Robert Hoehndorf
    Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. robert.hoehndorf@kaust.edu.sa.
  • Maxat Kulmanov
    Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.