Effect of incremental feature enrichment on healthcare text classification system: A machine learning paradigm.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: Healthcare tweets are particularly challenging due to its sparse layout and its limited character size. Compared to previous method based on "bag of words" (BOW) model, this study uniquely identifies the enrichment protocol and learns how semantically different aspects of feature selection such as BOW (feature F0), term frequency inverse document frequency (TF-IDF, feature F1), and latent semantic indexing (LSI, feature F2) when applied sequentially with classifier improves the overall performance.

Authors

  • Saurabh Kumar Srivastava
    Department of Computer Science & Engineering, JIIT, Noida, India.
  • Sandeep Kumar Singh
    Department of Computer Science & Engineering, JIIT, Noida, India.
  • Jasjit S Suri
    Advanced Knowledge Engineering Center, Global Biomedical Technologies, Inc., Roseville, CA, USA. Electronic address: jsuri@comcast.net.