TriLex: A fusion approach for unsupervised sentiment analysis of short texts.

Journal: PloS one
PMID:

Abstract

In recent years, online customer reviews and social media platforms have significantly impacted individuals' daily lives. Despite the generally short nature of textual content on these platforms, they convey a wide range of user sentiments. However, sentiment analysis of short texts poses a challenge due to their context limitations. In addition, traditional supervised machine learning methods often struggle with the dynamic nature of sentiment expression and the scarcity of labeled data, which is a cost-efficiency issue. To address these challenges, this paper proposes TriLex, a novel unsupervised approach that leverages the majority votes of multiple lexicon-based sentiment analysis tools. TriLex categorizes agreement among TextBlob, VADER, and AFINN as strong labels and disagreement as weak labels. To improve sentiment labeling, we normalize sentiment scores across all lexicons and apply weighted averaging to compute a majority vote sentiment score. It then generates a new label for the weak label based on a dynamic threshold derived from the majority vote. The effectiveness of TriLex is evaluated on benchmark datasets for the accuracy, F1 score, precision, and recall of Logistic Regression, XGBoost, and LSTM models. The proposed TriLex model improves the accuracy of sentiment prediction by 2%-8%. Overall, our results demonstrate that TriLex outperformed methods relying on individual lexicons and existing fusion-based alternatives.

Authors

  • Abdulrahman Alharbi
    Department of Computer and Information Sciences, Temple University, Philadelphia, Pennsylvania, United States of America.
  • Rafaa Aljurbua
    Department of Computer and Information Sciences, Temple University, Philadelphia, Pennsylvania, United States of America.
  • Shelly Gupta
    Department of Computer and Information Sciences, Temple University, Philadelphia, Pennsylvania, United States of America.
  • Zoran Obradovic