O-glycosylation site prediction for by combining properties and sequence features with support vector machine.
Journal:
Journal of bioinformatics and computational biology
Published Date:
Nov 19, 2021
Abstract
O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on and develop an O-glycosylation predictor for , named , through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, outperformed the existing O-glycosylation tool, suggesting that could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.