gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.
Journal:
Journal of theoretical biology
Published Date:
Oct 7, 2016
Abstract
DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA-Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.net/projects/gdnaprot.