Recent advances in deep learning and language models for studying the microbiome.

Journal: Frontiers in genetics
Published Date:

Abstract

Recent advancements in deep learning, particularly large language models (LLMs), made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein and genomic sequences, like natural languages, form a , enabling the adoption of LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications of deep learning and language models in analyzing microbiome and metagenomics data. We focus on problem formulations, necessary datasets, and the integration of language modeling techniques. We provide an extensive overview of protein/genomic language modeling and their contributions to microbiome studies. We also discuss applications such as novel viromics language modeling, biosynthetic gene cluster prediction, and knowledge integration for metagenomics studies.

Authors

  • Binghao Yan
    Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
  • Yunbi Nam
    Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States.
  • Lingyao Li
    School of Information, University of South Florida, Tampa, FL, United States.
  • Rebecca A Deek
    Department of Biostatistics and Health Data Science, University of Pittsburgh, Pittsburgh, PA, United States.
  • Hongzhe Li
    Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.
  • Siyuan Ma
    Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States.

Keywords

No keywords available for this article.