ProtAlign-ARG: antibiotic resistance gene characterization integrating protein language models and alignment-based scoring.

Journal: Scientific reports
Published Date:

Abstract

The evolution and spread of antibiotic resistance pose a global health challenge. Whole genome and metagenomic sequencing offer a promising approach to monitoring the spread, but typical alignment-based approaches for antibiotic resistance gene (ARG) detection are inherently limited in the ability to detect new variants. Large protein language models could present a powerful alternative but are limited by databases available for training. Here we introduce ProtAlign-ARG, a novel hybrid model combining a pre-trained protein language model and an alignment scoring-based model to expand the capacity for ARG detection from DNA sequencing data. ProtAlign-ARG learns from vast unannotated protein sequences, utilizing raw protein language model embeddings to improve the accuracy of ARG classification. In instances where the model lacks confidence, ProtAlign-ARG employs an alignment-based scoring method, incorporating bit scores and e-values to classify ARGs according to their corresponding classes of antibiotics. ProtAlign-ARG demonstrated remarkable accuracy in identifying and classifying ARGs, particularly excelling in recall compared to existing ARG identification and classification tools. We also extended ProtAlign-ARG to predict the functionality and mobility of ARGs, highlighting the model's robustness in various predictive tasks. A comprehensive comparison of ProtAlign-ARG with both the alignment-based scoring model and the pre-trained protein language model demonstrated the superior performance of ProtAlign-ARG.

Authors

  • Shafayat Ahmed
    Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, USA.
  • Muhit Islam Emon
    Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, USA.
  • Nazifa Ahmed Moumi
    Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, USA.
  • Lifu Huang
    Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, USA.
  • Dawei Zhou
    From the Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Sciences, Guangzhou, China (L.H.); Department of Radiology, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, No. 106 Zhongshan Er Road, Guangzhou 510080, China (L.H., Z.W.S., C.H., C.L., Z.L.); Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangzhou, China (L.H., Z.W.S., C.H., C.L., Z.L.); Department of TPS Algorithm, Xi'an OUR United Corporation, Xi'an, China (X.G.); State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi'an, China (D.Z.); Department of Radiology, Yichang Central People's Hospital Affiliated to the First Clinical Medical College of Three Gorges University, Yichang, China (Z.W., C.Y.); Institute of Diagnostic and Interventional Radiology, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China (L.D., H.L., J.Z., Yuehua Li); and Department of Radiology, Renmin Hospital of Wuhan University, Wuhan, China (L.L., Ying Li, T.Z., Y.Z.).
  • Peter Vikesland
    Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, USA.
  • Amy Pruden
    Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA, USA.
  • Liqing Zhang
    Department of Computer Science, Virginia Tech, Blacksburg, VA, USA. lqzhang@cs.vt.edu.

Keywords

No keywords available for this article.