Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC.
Journal:
Journal of theoretical biology
Published Date:
May 21, 2016
Abstract
As a new type of post-translational modification, lysine phosphoglycerylation plays a key role in regulating glycolytic process and metabolism in cells. Due to the traditional experimental methods are time-consuming and labor-intensive, it is important to develop computational methods to identify the potential phosphoglycerylation sites. However, the prediction performance of the existing phosphoglycerylation site predictor is not satisfactory. In this study, a novel predictor named CKSAAP_PhoglySite is developed to predict phosphoglycerylation sites by using composition of k-spaced amino acid pairs and fuzzy support vector machine. On the one hand, after many aspects of assessments, we find the composition of k-spaced amino acid pairs is more suitable for representing the protein sequence around the phosphoglycerylation sites than other encoding schemes. On the other hand, the proposed fuzzy support vector machine algorithm can effectively handle the imbalanced and noisy problem in phosphoglycerylation sites training dataset. Experimental results indicate that CKSAAP_PhoglySite outperforms the existing phosphoglycerylation site predictor Phogly-PseAAC significantly. A matlab software package for CKSAAP_PhoglySite can be freely downloaded from https://github.com/juzhe1120/Matlab_Software/blob/master/CKSAAP_PhoglySite_Matlab_Software.zip.