MlyPredCSED: based on extreme point deviation compensated clustering combined with cross-scale convolutional neural networks to predict multiple lysine sites in human.

Journal: Briefings in bioinformatics
PMID:

Abstract

In post-translational modification, covalent bonds on lysine and attached chemical groups significantly change proteins' physical and chemical properties. They shape protein structures, enhance function and stability, and are vital for physiological processes, affecting health and disease through mechanisms like gene expression, signal transduction, protein degradation, and cell metabolism. Although lysine (K) modification sites are considered among the most common types of post-translational modifications in proteins, research on K-PTMs has largely overlooked the synergistic effects between different modifications and lacked the techniques to address the problem of sample imbalance. Based on this, the Extreme Point Deviation Compensated Clustering (EPDCC) Undersampling algorithm was proposed in this study and combined with Cross-Scale Convolutional Neural Networks (CSCNNs) to develop a novel computational tool, MlyPredCSED, for simultaneously predicting multiple lysine modification sites. MlyPredCSED employs Multi-Label Position-Specific Triad Amino Acid Propensity and the physicochemical properties of amino acids to enhance the richness of sequence information. To address the challenge of sample imbalance, the innovative EPDCC Undersampling technique was introduced to adjust the majority class samples. The model's training and testing phase relies on the advanced CSCNN framework. MlyPredCSED, through cross-validation and testing, outperformed existing models, especially in complex categories with multiple modification sites. This research not only provides an efficient method for the identification of lysine modification sites but also demonstrates its value in biological research and drug development. To facilitate efficient use of MlyPredCSED by researchers, we have specifically developed an accessible free web tool: http://www.mlypredcsed.com.

Authors

  • Yun Zuo
    Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.
  • Xingze Fang
    School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.
  • Jiankang Chen
    School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China.
  • Jiayi Ji
    Department of Population Health Science and Policy Icahn School of Medicine at Mount Sinai New York NY.
  • Yuwen Li
  • Zeyu Wu
    School of Food and Biological Engineering, Hefei University of Technology, Hefei 230601, China; Engineering Research Center of Bio-Process, Ministry of Education, Hefei University of Technology, Hefei 230601, China. Electronic address: wuzeyu@hfut.edu.cn.
  • Xiangrong Liu
  • Xiangxiang Zeng
    Department of Computer Science, Hunan University, Changsha, China.
  • Zhaohong Deng
    School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China.
  • Hongwei Yin
    School of Information Engineering, Huzhou University, Huzhou 313000, China.
  • Anjing Zhao
    Department of Oncology, The First Affiliated Hospital of Naval Military Medical University, Shanghai 200000, China.