SetBERT: the deep learning platform for contextualized embeddings and explainable predictions from high-throughput sequencing.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: High-throughput sequencing is a modern sequencing technology used to profile microbiomes by sequencing thousands of short genomic fragments from the microorganisms within a given sample. This technology presents a unique opportunity for artificial intelligence to comprehend the underlying functional relationships of microbial communities. However, due to the unstructured nature of high-throughput sequencing data, nearly all computational models are limited to processing DNA sequences individually. This limitation causes them to miss out on key interactions between microorganisms, significantly hindering our understanding of how these interactions influence the microbial communities as a whole. Furthermore, most computational methods rely on post-processing of samples which could inadvertently introduce unintentional protocol-specific bias.

Authors

  • David W Ludwig
    Department of Computer Science, Middle Tennessee State University, Murfreesboro, TN United States.
  • Christopher Guptil
    Department of Mathematics and Computer Science, Miami University, Oxford, OH United States.
  • N Reed Alexander
    Department of Biology, Middle Tennessee State University, Murfreesboro, TN United States.
  • Kateryna Zhalnina
    Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA United States.
  • Edi M-L Wipf
    Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA United States.
  • Albina Khasanova
    Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA United States.
  • Nicholas A Barber
    Department of Biology, San Diego State University, San Diego, CA United States.
  • Wesley Swingley
    Department of Biological Sciences, Northern Illinois University, DeKalb, IL United States.
  • Donald M Walker
    Department of Biology, Middle Tennessee State University, Murfreesboro, TN United States.
  • Joshua L Phillips
    Department of Computer Science, Middle Tennessee State University, Murfreesboro, TN United States.

Keywords

No keywords available for this article.