PPSNO: A Feature-Rich SNO Sites Predictor by Stacking Ensemble Strategy from Protein Sequence-Derived Information.

Journal: Interdisciplinary sciences, computational life sciences
PMID:

Abstract

The protein S-nitrosylation (SNO) is a significant post-translational modification that affects the stability, activity, cellular localization, and function of proteins. Therefore, highly accurate prediction of SNO sites aids in grasping biological function mechanisms. In this document, we have constructed a predictor, named PPSNO, forecasting protein SNO sites using stacked integrated learning. PPSNO integrates multiple machine learning techniques into an ensemble model, enhancing its predictive accuracy. First, we established benchmark datasets by collecting SNO sites from various sources, including literature, databases, and other predictors. Second, various techniques for feature extraction are applied to derive characteristics from protein sequences, which are subsequently amalgamated into the PPSNO predictor for training. Five-fold cross-validation experiments show that PPSNO outperformed existing predictors, such as PSNO, PreSNO, pCysMod, DeepNitro, RecSNO, and Mul-SNO. The PPSNO predictor achieved an impressive accuracy of 92.8%, an area under the curve (AUC) of 96.1%, a Matthews correlation coefficient (MCC) of 81.3%, an F1-score of 85.6%, an SN of 79.3%, an SP of 97.7%, and an average precision (AP) of 92.2%. We also employed ROC curves, PR curves, and radar plots to show the superior performance of PPSNO. Our study shows that fused protein sequence features and two-layer stacked ensemble models can improve the accuracy of predicting SNO sites, which can aid in comprehending cellular processes and disease mechanisms. The codes and data are available at https://github.com/serendipity-wly/PPSNO .

Authors

  • Lun Zhu
    School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China.
  • Liuyang Wang
    School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China.
  • Zexi Yang
    School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China.
  • Piao Xu
    College of Economics and Management, Nanjing Forestry University, Nanjing, 210037, China.
  • Sen Yang
    Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.