PLPTP: A Motif-based Interpretable Deep Learning Framework Based on Protein Language Models for Peptide Toxicity Prediction.
Journal:
Journal of molecular biology
PMID:
40158838
Abstract
Peptide toxicity prediction holds significant importance in drug development and biotechnology, as accurately identifying toxic peptide sequences is crucial for designing safer peptide-based drugs. This study proposes a deep learning-based model for peptide toxicity prediction, integrating Evolutionary Scale Modeling (ESM2), Bidirectional Long Short-Term Memory (BiLSTM), and Deep Neural Network (DNN). The ESM2 model captures evolutionary information from peptide sequences, providing a rich context for the sequences; the BiLSTM network focuses on extracting contextual dependencies, thereby capturing long-range dependencies within the sequence; and the DNN further classifies the extracted features to achieve the final toxicity prediction. To enhance the reliability and transparency of the model, we also conducted motif analysis to identify key patterns in the data, which helps to explain the model's attention mechanism and its classification performance. To address the class imbalance in the dataset, we employed Focal Loss as the loss function, which enhances the model's ability to identify minority class samples by reducing the contribution of easily classified samples. Experimental results demonstrate that the proposed model performs exceptionally well across multiple evaluation metrics, particularly in handling imbalanced data, achieving significant improvements over traditional methods. This result highlights the model's potential to improve the accuracy of peptide toxicity prediction and its valuable role in drug development and biotechnology research. The PLPTP web server is available at https://www.bioai-lab.com/PLPTP.