Enhancing machine learning-based sentiment analysis through feature extraction techniques.

Journal: PloS one
PMID:

Abstract

A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model's performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

Authors

  • Noura A Semary
    Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shibin El Kom, Egypt.
  • Wesam Ahmed
    Department of Information Technology, Faculty of Computers and Information, Menoufia University, Shibin El Kom, Egypt.
  • Khalid Amin
    Information Technology Department, Faculty of Computers and Information, Menoufia University, Shibin El Kom 32511, Egypt.
  • Pawel Plawiak
    Institute of Telecomputing, Faculty of Physics, Mathematics and Computer Science, Cracow University of Technology, Krakow, Poland.
  • Mohamed Hammad
    Information Technology Department, Faculty of Computers and Information, Menoufia University, Shebin El-Koom 32511, Egypt.