DeepToA: an ensemble deep-learning approach to predicting the theater of activity of a microbiome.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Metagenomics is the study of microbiomes using DNA sequencing. A microbiome consists of an assemblage of microbes that is associated with a 'theater of activity' (ToA). An important question is, to what degree does the taxonomic and functional content of the former depend on the (details of the) latter? Here, we investigate a related technical question: Given a taxonomic and/or functional profile estimated from metagenomic sequencing data, how to predict the associated ToA? We present a deep-learning approach to this question. We use both taxonomic and functional profiles as input. We apply node2vec to embed hierarchical taxonomic profiles into numerical vectors. We then perform dimension reduction using clustering, to address the sparseness of the taxonomic data and thus make the problem more amenable to deep-learning algorithms. Functional features are combined with textual descriptions of protein families or domains. We present an ensemble deep-learning framework DeepToA for predicting the ToA of amicrobial community, based on taxonomic and functional profiles. We use SHAP (SHapley Additive exPlanations) values to determine which taxonomic and functional features are important for the prediction.

Authors

  • Wenhuan Zeng
    Department of Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen 72076, Germany.
  • Anupam Gautam
    Department of Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen 72076, Germany.
  • Daniel H Huson
    University of Tübingen, Department of Computer Science, Sand 14, Tübingen, 72076, Germany.