Ontology-aware deep learning enables ultrafast and interpretable source tracking among sub-million microbial community samples from hundreds of niches.

Journal: Genome medicine
Published Date:

Abstract

The taxonomic structure of microbial community sample is highly habitat-specific, making source tracking possible, allowing identification of the niches where samples originate. However, current methods face challenges when source tracking is scaled up. Here, we introduce a deep learning method based on the Ontology-aware Neural Network approach, ONN4MST, for large-scale source tracking. ONN4MST outperformed other methods with near-optimal accuracy when source tracking among 125,823 samples from 114 niches. ONN4MST also has a broad spectrum of applications. Overall, this study represents the first model-based method for source tracking among sub-million microbial community samples from hundreds of niches, with superior speed, accuracy, and interpretability. ONN4MST is available at https://github.com/HUST-NingKang-Lab/ONN4MST .

Authors

  • Yuguo Zha
    MOE Key Laboratory of Molecular Biophysics, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
  • Hui Chong
    MOE Key Laboratory of Molecular Biophysics, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
  • Hao Qiu
    Department of Orthopedics, Xinqiao Hospital, Army Military Medical University, Chongqing, People's Republic of China. Electronic address: qiutmmu@163.com.
  • Kai Kang
    Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America.
  • Yuzheng Dun
    School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
  • Zhixue Chen
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
  • Xuefeng Cui
    Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China. xfcui@email.sdu.edu.cn.
  • Kang Ning
    MOE Key Laboratory of Molecular Biophysics, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China. Electronic address: ningkang@hust.edu.cn.