Genome-resolved metagenomics from short-read sequencing data in the era of artificial intelligence.

Journal: Functional & integrative genomics
Published Date:

Abstract

Genome-resolved metagenomics is a computational method that enables researchers to reconstruct microbial genomes from a given sample directly. This process involves three major steps, i.e. (i) preprocessing of the reads (ii) metagenome assembly, and (iii) genome binning, with (iv) taxonomic classification, and (v) functional annotation as additional steps. Despite the availability of multiple bioinformatics approaches, metagenomic data analysis encounters various challenges due to high dimensionality, data sparseness, and complexity. Meanwhile, integrating artificial intelligence (AI) at different stages of data analysis has transformed genome-resolved metagenomics. Though the application of machine learning and deep learning in metagenomic annotation started earlier, the emergence of better sequencing technologies, improved throughput, and reduced processing time have rendered the initial models less efficient. Consequently, the number of AI-based metagenomics tools is continuously increasing. The recent AI-based tools demonstrate superior performance in handling complex and multi-dimensional metagenomics data, offering improved accuracy, scalability, and efficiency compared to traditional models. In this paper, we reviewed recent AI-based tools specifically developed for short-read metagenomic data, and their underlying models for genome-resolved metagenomics. It also discusses the performance of these tools and overviews their usability in metagenomics research. We believe this study will provide researchers with insights into the strengths and limitations of current AI-based approaches, serving as a valuable resource for selecting appropriate tools and guiding future advancements in genome-resolved metagenomics.

Authors

  • Hajra Qayyum
    Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences & Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan.
  • Zaara Ishaq
    Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan.
  • Amjad Ali
    Department of Computer Science, University of Peshawar, Peshawar, Pakistan.
  • Masood Ur Rehman Kayani
    Metagenomics Discovery Laboratory, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan. m.kayani@sines.nust.edu.pk.
  • Lisu Huang
    Department of Infectious Disease, Children's Hospital, Zhejiang University School of Medicine, 3333 Binsheng Road, Binjiang District, Hangzhou, 310052, China. lisuhuang@zju.edu.cn.