O-glycosylation site prediction for by combining properties and sequence features with support vector machine.

Journal: Journal of bioinformatics and computational biology
Published Date:

Abstract

O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on and develop an O-glycosylation predictor for , named , through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, outperformed the existing O-glycosylation tool, suggesting that could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.

Authors

  • Yan Zhu
    Department of Chemistry, Xixi Campus, Zhejiang University, Hangzhou, 310028, China. Electronic address: zhuyan@zju.edu.cn.
  • Shuwan Yin
    School of Science, Dalian Maritime University, Dalian 116026, P. R. China.
  • Jia Zheng
    School of Advanced Manufacturing Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China.
  • Yixia Shi
    School of Mathematics and Statistics, Lingnan Normal University, Zhanjiang 524048, P. R. China.
  • Cangzhi Jia
    Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China. Electronic address: cangzhijia@dlmu.edu.cn.