Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data.

Journal: Scientific reports
Published Date:

Abstract

Sentiment analysis is a process in Natural Language Processing that involves detecting and classifying emotions in texts. The emotion is focused on a specific thing, an object, an incident, or an individual. Although some tasks are concerned with detecting the existence of emotion in text, others are concerned with finding the polarities of the text, which is classified as positive, negative, or neutral. The task of determining whether a comment contains inappropriate text that affects either individual or group is called offensive language identification. The existing research has concentrated more on sentiment analysis and offensive language identification in a monolingual data set than code-mixed data. Code-mixed data is framed by combining words and phrases from two or more distinct languages in a single text. It is quite challenging to identify emotion or offensive terms in the comments since noise exists in code-mixed data. The majority of advancements in hostile language detection and sentiment analysis are made on monolingual data for languages with high resource requirements. The proposed system attempts to perform both sentiment analysis and offensive language identification for low resource code-mixed data in Tamil and English using machine learning, deep learning and pre-trained models like BERT, RoBERTa and adapter-BERT. The dataset utilized for this research work is taken from a shared task on Multi task learning DravidianLangTech@ACL2022. Another challenge addressed by this work is the extraction of semantically meaningful information from code-mixed data using word embedding. The result represents an adapter-BERT model gives a better accuracy of 65% for sentiment analysis and 79% for offensive language identification when compared with other trained models.

Authors

  • Kogilavani Shanmugavadivel
    Department of Computer Science Engineering, Kongu Engineering College, Perundurai, Erode, 638 060 Tamil Nadu, India.
  • V E Sathishkumar
    Department of Information and Communication Engineering, Sunchon National University, Suncheon, Republic of Korea.
  • Sandhiya Raja
    Department of Information Technology, Kongu Engineering College, Perundurai, Erode, 638060, India.
  • T Bheema Lingaiah
    School of Biomedical Engineering, Jimma Institute of Technology, Jimma, Ethiopia.
  • S Neelakandan
    Department of IT, Jeppiaar Institute of Technology, Sriperumbudur, India. snksnk17@gmail.com.
  • Malliga Subramanian
    Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, India.