BanglaNewsClassifier: A machine learning approach for news classification in Bangla Newspapers using hybrid stacking classifiers.

Journal: PloS one
Published Date:

Abstract

Bangla news floods the web, and the need for smarter and more efficient classification techniques is greater than ever. Previous studies mostly focused on traditional models, overlooking the potential of hybrid techniques to handle the ever-growing complex dataset and its linguistic patterns in Bangla to achieve higher accuracy. Addressing the challenge, this study presents a comprehensive approach to classify Bangla news articles into eight distinct categories using various machine learning and deep learning techniques. The use of traditional machine learning algorithms, deep learning architectures, and hybrid models, including novel stacking classifiers, was a part of our experiment. This study utilized a dataset of 118,404 Bangla news articles, applying rigorous feature extraction techniques including TF-IDF vectorization and word2Vec embeddings. Our best-performing model, a stacking meta-classifier combining bidirectional long short-term memory and support vector machine, achieved a remarkable 94% accuracy, leaving all basic models' performance behind. Also, we provided an in-depth analysis of model performances, including confusion matrices, ROC curves, and error analysis, offering insights into the strengths and limitations of each approach. This research contributes significantly to the field of Bangla natural language processing and demonstrates the efficacy of ensemble methods and deep learning in news classification for low-resource languages.

Authors

  • Tanzir Hossain
    Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh.
  • Ar-Rafi Islam
    Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh.
  • Md Humaion Kabir Mehedi
    Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh.
  • Annajiat Alim Rasel
    Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh.
  • M Abdullah-Al-Wadud
    Department of Software Engineering, College of Computer and Information Sciences, King Saud University, 11543, Riyadh, Saudi Arabia.
  • Jia Uddin
    AI and Big Data Department, Woosong University, Daejeon, 34606, South Korea.