Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights.

Journal: MethodsX
Published Date:

Abstract

This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control.•Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost.•The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data.•Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.

Authors

  • Renuka Agrawal
    Symbiosis Institute of Technology - Pune Campus, Symbiosis International (Deemed University), Pune, India.
  • Mehuli Majumder
    Symbiosis Institute of Technology - Pune Campus, Symbiosis International (Deemed University), Pune, India.
  • Ishita Yadav
    Symbiosis Institute of Technology - Pune Campus, Symbiosis International (Deemed University), Pune, India.
  • Nandini Taneja
    Symbiosis Institute of Technology - Pune Campus, Symbiosis International (Deemed University), Pune, India.
  • Safa Hamdare
    Nottingham Trent University-Cliffton Campus, Nottingham, UK.
  • Preeti Hemnani
    Department of Electronics and Telecommunication Engineering, SIES Graduate School of Technology, Mumbai, India.

Keywords

No keywords available for this article.