Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data.

Journal: PLoS computational biology
PMID:

Abstract

We use open source human gut microbiome data to learn a microbial "language" model by adapting techniques from Natural Language Processing (NLP). Our microbial "language" model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial taxa and the common compositional patterns in microbial communities. The learned model produces contextualized taxon representations that allow a single microbial taxon to be represented differently according to the specific microbial environment in which it appears. The model further provides a sample representation by collectively interpreting different microbial taxa in the sample and their interactions as a whole. We demonstrate that, while our sample representation performs comparably to baseline models in in-domain prediction tasks such as predicting Irritable Bowel Disease (IBD) and diet patterns, it significantly outperforms them when generalizing to test data from independent studies, even in the presence of substantial distribution shifts. Through a variety of analyses, we further show that the pre-trained, context-sensitive embedding captures meaningful biological information, including taxonomic relationships, correlations with biological pathways, and relevance to IBD expression, despite the model never being explicitly exposed to such signals.

Authors

  • Quintin Pope
    School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America.
  • Rohan Varma
    Electrical and Computer Engineering, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
  • Christine Tataru
    Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts, United States of America.
  • Maude M David
    Department of Pharmaceutical Sciences, Oregon State University, Corvallis, Oregon, United States of America.
  • Xiaoli Fern
    School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, United States of America.