SecProMTB: Support Vector Machine-Based Classifier for Secretory Proteins Using Imbalanced Data Sets Applied to Mycobacterium tuberculosis.

Journal: Proteomics
Published Date:

Abstract

Secretory proteins of Mycobacterium tuberculosis have created more concern, given their dominant immunogenicity and role in pathogenesis. In view of expensive and time-consuming traditional biochemical experiments, an advanced support vector machine model named SecProMTB is constructed in this study and the proteins are identified by a bioinformatic approach. First, an improved pseudo-amino acid composition (PseAAC) algorithm is used to extract features from all entities. Second, a novel imbalanced-data strategy is proposed and adopted to divide the original data set into train set and test set. Third, to overcome the overfitting problem, feature-ranking algorithms are applied with an increment feature selection. Finally, the model is trained and optimized. Consequently, a model is obtained with an area under the curve of 0.862 and average accuracy of 86% in the independent test. For the convenience of users, SecProMTB and related data are openly accessible at http://server.malab.cn/SecProMTB/index.jsp.

Authors

  • Chaolu Meng
    College of Intelligence and Computing, Tianjin University, 300350, Tianjin, China.
  • Leyi Wei
    School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.
  • Quan Zou