Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.

Journal: Interdisciplinary sciences, computational life sciences
Published Date:

Abstract

The emergence of methicillin-resistant Staphylococcus aureus (MRSA) as a recognized cause of community-acquired and hospital infections has brought about a need for the efficient and accurate identification of peptides with anti-MRSA properties in drug discovery and development pipelines. However, current experimental methods often tend to be labor- and resource-intensive. Thus, there is an immediate requirement to develop practical computational solutions for identifying sequence-based anti-MRSA peptides. Lately, pre-trained protein language models (pLMs) have emerged as a remarkable advancement for encoding peptide sequences as discriminative feature embeddings, uncovering plentiful protein-level information and successfully repurposing it for in silico peptide property prediction. In this study, we present pLM4MRSA, a framework based on pLMs designed to enhance the accuracy of predicting anti-MRSA peptides. In this framework, we combine feature embeddings from various pLMs, such as ProtTrans, and evolutionary-scale modeling (ESM-2) which provide complementary information for prediction. These individual pLM strengths are integrated to form hybrid feature embeddings. Next, we apply principal component analysis (PCA) to process these hybrid embeddings. The resulting PCA-transformed feature vectors are then used as inputs for constructing the predictive model. Experimental results on the independent test dataset showed that the proposed pLM4MRSA approach achieved a balanced accuracy and Matthew correlation coefficient of 0.983 and 0.980, respectively, representing remarkable improvements over the state-of-the-art methods by 2.53%-4.83% and 7.73%-13.23%, respectively. This indicates that pLM4MRSA is a high-performance prediction model with excellent scope of applicability. Additionally, comparison with well-known hand-crafted features demonstrated that the proposed hybrid feature embeddings complement each other effectively, capturing discriminative patterns for more accurate anti-MRSA peptide prediction. We anticipate that pLM4MRSA will serve as an effective solution for accurate and high-capacity prediction of anti-MRSA peptides from peptide sequences.

Authors

  • Watshara Shoombuatong
    Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
  • Pakpoom Mookdarsanit
    Faculty of Science, Computer Science and Artificial Intelligence, Chandrakasem Rajabhat University, Bangkok, 10900, Thailand.
  • Lawankorn Mookdarsanit
    Business Information System, Faculty of Management Science, Chandrakasem Rajabhat University, Bangkok, 10900, Thailand. lawankorn.s@chandra.ac.th.
  • Nalini Schaduangrat
    Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
  • Saeed Ahmed
    School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
  • Muhammad Kabir
    Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
  • Pramote Chumnanpuen
    Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand.