NToxSEM: Enhancing prediction of neurotoxic peptides and neurotoxins using a stacked ensemble-based multimodal framework.
Journal:
Protein science : a publication of the Protein Society
Published Date:
Jul 1, 2026
Abstract
The safety assessment of therapeutic proteins and genetically modified (GM) organisms relies heavily on the rapid and accurate prediction of peptides, and proteins that exhibit neurotoxic activity. Since experimental methods are time-consuming and costly, they are not technically suitable for the cost-effective characterization of neurotoxic peptides and neurotoxins. Thus, machine learning (ML)-based methods that can predict neurotoxic peptides and neurotoxins based on sequence information are highly desirable. In this study, we propose NToxSEM, an innovative stacked framework using a multimodal representation approach for the prediction of neurotoxic peptides and neurotoxins with high accuracy (ACC). To the best of our knowledge, this is the first application of a multimodal stacked ensemble-based architecture for predicting both neurotoxic peptides and neurotoxins. NToxSEM processes and generates features from multiple modalities, including sequence-based feature representations, image-based feature representations, and pretrained language model-based feature representations, which can systematically capture information-rich characteristics of neurotoxic peptides and neurotoxins. In addition, NToxSEM utilizes a two-stage prediction strategy to refine the model's predictive performance. In NToxSEM, the first stage constructs preliminary prediction models, while the second stage selects potential prediction models through several powerful feature selection methods and integrates them to optimize the final integrative model. Extensive comparative experiments conducted on several independent test datasets demonstrate that NToxSEM consistently outperforms existing methods, achieving MCC values of 0.864, 0.841, and 0.834, on peptide, protein, and combined datasets (DATs-Com), respectively. We anticipate that, this novel prediction model can help narrow down and select candidate peptides and proteins with neurotoxic activity. All of the codes and datasets are accessible at: https://github.com/saeed344/NToxSEM.
Authors
Keywords
No keywords available for this article.