DSNetax: a deep learning species annotation method based on a deep-shallow parallel framework.

Journal: Briefings in bioinformatics
PMID:

Abstract

Microbial community analysis is an important field to study the composition and function of microbial communities. Microbial species annotation is crucial to revealing microorganisms' complex ecological functions in environmental, ecological and host interactions. Currently, widely used methods can suffer from issues such as inaccurate species-level annotations and time and memory constraints, and as sequencing technology advances and sequencing costs decline, microbial species annotation methods with higher quality classification effectiveness become critical. Therefore, we processed 16S rRNA gene sequences into k-mers sets and then used a trained DNABERT model to generate word vectors. We also design a parallel network structure consisting of deep and shallow modules to extract the semantic and detailed features of 16S rRNA gene sequences. Our method can accurately and rapidly classify bacterial sequences at the SILVA database's genus and species level. The database is characterized by long sequence length (1500 base pairs), multiple sequences (428,748 reads) and high similarity. The results show that our method has better performance. The technique is nearly 20% more accurate at the species level than the currently popular naive Bayes-dominated QIIME 2 annotation method, and the top-5 results at the species level differ from BLAST methods by <2%. In summary, our approach combines a multi-module deep learning approach that overcomes the limitations of existing methods, providing an efficient and accurate solution for microbial species labeling and more reliable data support for microbiology research and application.

Authors

  • Hongyuan Zhao
    National Engineering Research Center for Cereal Fermentation and Food Biomanufacturing, State Key Laboratory of Food Science and Technology, School of Food Science and technology, Jiangnan University, Wuxi, China.
  • Suyi Zhang
    State Key Laboratory of Transducer Technology, Aerospace Information Research Institute. Chinese Academy of Sciences, Beijing 100190, China.
  • Hui Qin
    Department of Intensive Care Medicine, The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, The Third Affiliated Hospital of Nanjing Medical University, Nanjing Medical University, Changzhou, China.
  • Xiaogang Liu
    Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China. gary.samsph@gmail.com.
  • Dongna Ma
    National Engineering Research Center of Cereal Fermentation and Food Biomanufacturing, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu 214122, China.
  • Xiao Han
    College of Chemistry, Chemical Engineering and Materials Science, Collaborative Innovation Center of Functionalized Probes for Chemical Imaging in Universities of Shandong, Key Laboratory of Molecular and Nano Probes, Ministry of Education, Shandong Provincial Key Laboratory of Clean Production of Fine Chemicals, Shandong Normal University Jinan 250014 China cyzhang@sdnu.edu.cn.
  • Jian Mao
    State Key Laboratory of Heavy Oil Processing and College of Chemistry and Chemical Engineering, China University of Petroleum (East China), Qingdao 266580, China.
  • Shuangping Liu
    Center for Bio-inspired Energy Science, Northwestern University, 2145 Sheridan Road, Evanston, IL 60208, USA.