S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique.

Journal: Journal of theoretical biology
Published Date:

Abstract

Protein S-sulfenylation is a reversible post-translational modification involving covalent attachment of hydroxide to the thiol group of cysteine residues, which is involved in various biological processes including cell signaling, response to stress and protein functions. Herein we present S-SulfPred, a support vector machine based model to capture potential S-sulfenylation sites and improve the efficiency and relevance of experimental identification of protein S-sulfenylation sites. One-sided selection (OSS) undersampling and synthetic minority oversampling technique (SMOTE) oversampling were combined to establish balanced training datasets. This approach is shown to perform better than using only OSS or SMOTE in an independent test. The best combination of position-specific amino acid propensity and five physicochemical properties of amino acids were selected to optimize the predictor performance. Using S-SulfPred, we achieve an average sensitivity of 74.62%, and an average specificity of 71.62% on independent datasets. Compared with other published tools, S-SulfPred attains both higher sensitivity and specificity. We not only propose a highly accurate method to predict protein S-sulfenylation sites, but also provide insights that could improve the efficiency of other bioinformatics tools.

Authors

  • Cangzhi Jia
    Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China. Electronic address: cangzhijia@dlmu.edu.cn.
  • Yun Zuo
    Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.