DeepCheck: multitask learning aids in assessing microbial genome quality.

Journal: Briefings in bioinformatics
PMID:

Abstract

Metagenomic analyses facilitate the exploration of the microbial world, advancing our understanding of microbial roles in ecological and biological processes. A pivotal aspect of metagenomic analysis involves assessing the quality of metagenome-assembled genomes (MAGs), crucial for accurate biological insights. Current machine learning-based methods often treat completeness and contamination prediction as separate tasks, overlooking their inherent relationship and limiting models' generalization. In this study, we present DeepCheck, a multitasking deep learning framework for simultaneous prediction of MAG completeness and contamination. DeepCheck consistently outperforms existing tools in accuracy across various experimental settings and demonstrates comparable speed while maintaining high predictive accuracy even for new lineages. Additionally, we employ interpretable machine learning techniques to identify specific genes and pathways that drive the model's predictions, enabling independent investigation and assessment of these biological elements for deeper insights.

Authors

  • Guo Wei
    Department of Medicine, University of Utah School of Medicine, Salt Lake City, USA.
  • Nannan Wu
    State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China.
  • Kunyang Zhao
    State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China.
  • Sihai Yang
    State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 163 Xianlin Avenue, Qixia District, Nanjing 210000, China.
  • Long Wang
  • Yan Liu
    Department of Clinical Microbiology, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, People's Republic of China.