ToxPLTC: Peptide Toxicity Prediction by Integrating Pretrained T5 Protein Language Model and Text Convolutional Neural Network.

Journal: Journal of chemical information and modeling
Published Date:

Abstract

Peptide-based therapeutics show promising potential in treating a range of diseases, such as diabetes, cancer, and chronic pain. However, critical challenges, including peptide toxicity, immunogenicity, and stability deficiencies of peptides, have become major obstacles to their direct clinical application. Traditional toxicity testing methods based on a wet lab are not only costly but also time-consuming. In contrast, the classification model based on deep learning provides a new technical path for the efficient identification of peptide toxicity. In this study, we propose a deep learning framework called ToxPLTC, which employs the ProtT5 protein language model based on the Transformer architecture for pretraining peptide sequences, adopts the borderline SMOTE algorithm to handle an imbalanced training set data, and utilizes a text convolutional neural network combined with a fully connected layer for classification. Additionally, visualization analysis, motif analysis, and mutation-scan analysis are performed to understand the function of each module and enhance the interpretability of our model. The applicability domain is constructed based on the K-NN strategy to define the effective prediction range of our model to ensure the reliability of model predictions. The ToxPLTC model achieves a balanced accuracy of 93.02% on independent test set 1 and 88.04% on independent test set 2. Experimental results demonstrate that our model outperforms existing models on independent test sets and has good generalization ability. The ToxPLTC model possesses great potential as a valuable and robust tool for peptide-based drug development. The source data sets and codes can be available at the following GitHub repository: https://github.com/yunyunliang88/ToxPLTC.

Authors

Keywords

No keywords available for this article.