A hybrid deep learning model for sentiment analysis of COVID-19 tweets with class balancing.

Journal: Scientific reports
Published Date:

Abstract

The widespread dissemination of misinformation and the diverse public sentiment observed during the COVID-19 pandemic highlight the necessity for accurate sentiment analysis of social media discourse. This study proposes a hybrid deep learning (DL) model that integrates Bidirectional Encoder Representations from Transformers (BERT) for contextual feature extraction with Long Short-Term Memory (LSTM) networks for sequential learning to classify COVID-19-related sentiments. To enhance data quality, advanced text preprocessing techniques, including Unicode normalization, contraction expansion, and emoji conversion, are applied. Additionally, to mitigate class imbalance, Random OverSampling (ROS) is employed, leading to significant improvements in model performance. Before applying ROS, the model exhibited lower accuracy and inconsistent performance across sentiment categories. After balancing the dataset, accuracy for binary classification increased to 92.10%, with corresponding precision, sensitivity, and specificity of 92.10%, 92.10%, and 91.50%, respectively. For three-class sentiment classification, accuracy improved to 89.47%, with precision, sensitivity, and specificity of 89.80%, 89.47%, and 94.10%, respectively. In five-class sentiment classification, accuracy reached 81.78%, with precision, sensitivity, and specificity of 82.19%, 81.78%, and 95.28%, respectively. These findings demonstrate the efficacy of combining deep learning-based sentiment analysis with advanced text preprocessing and class balancing techniques for accurately classifying public sentiment related to COVID-19 across multiple sentiment categories.

Authors

  • Md Alamin Talukder
    Department of Computer Science and Engineering, Jagannath University, Dhaka, Bangladesh. Electronic address: alamintalukder.cse.jnu@gmail.com.
  • Md Ashraf Uddin
    School of Information Technology, Deakin University, Geelong 3125, Australia.
  • Suman Roy
    Department of Computer Science and Engineering, Jagannath University, Dhaka, 1100, Bangladesh.
  • Partho Ghose
    Department of Biological and Agricultural Engineering, Texas A&M University, College Station, Texas, USA. Electronic address: partho.ghose@tamu.edu.
  • Smita Sarker
    Department of Statistics, Hajee Mohammad Danesh Science and Technology University, Dinajpur, 5200, Bangladesh.
  • Ansam Khraisat
    School of Information Technology, Deakin University, Geelong 3125, Australia.
  • Mohsin Kazi
    Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box-2457, Riyadh 11451, Saudi Arabia. Electronic address: mkazi@ksu.edu.sa.
  • Md Momtazur Rahman
    Department of English and Modern Languages, International University of Business Agriculture and Technology, Dhaka, Bangladesh.
  • Musawer Hakimi
    Department of Computer Science, Samangan University, Northeast Aybak, Samangan Province, Afghanistan. musawer.hakimi@smgu.edu.af.