Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method.

Journal: mSystems
PMID:

Abstract

Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.

Authors

  • Mary Maranga
    Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
  • Pawel Szczerbiak
    Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
  • Valentyn Bezshapkin
    Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland.
  • Vladimir Gligorijević
    Center for Computational Biology, Flatiron Institute, New York, NY, USA. vgligorijevic@flatironinstitute.org.
  • Chris Chandler
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Richard Bonneau
    Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003, Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD and Detroit, MI 48201, USA, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, Computer Science Department, Courant institute of Mathematical Sciences, New York University, New York, NY 10012 and Department of Computer Science, Wayne State University, Detroit, MI 48202, USA Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003, Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD and Detroit, MI 48201, USA, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, Computer Science Department, Courant institute of Mathematical Sciences, New York University, New York, NY 10012 and Department of Computer Science, Wayne State University, Detroit, MI 48202, USA.
  • Ramnik J Xavier
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Tommi Vatanen
    Broad Institute of MIT and Harvard, Cambridge, MA, USA.
  • Tomasz Kosciolek
    Department of Computer Science, Bioinformatics Group, University College London, Gower Street, London, WC1E 6BT, United Kingdom.