A Feature Selection Approach Based on Interclass and Intraclass Relative Contributions of Terms.

Journal: Computational intelligence and neuroscience
Published Date:

Abstract

Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF, t-Test, and CMFS algorithms.

Authors

  • Hongfang Zhou
    School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, China.
  • Jie Guo
    School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, China.
  • Yinghui Wang
    Shandong Luoxin Pharmaceutical Group Stock Co. Ltd, Linyi, Shandong, China.
  • Minghua Zhao
    School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, Shaanxi 710048, China.