Sentiment classification for telugu using transformed based approaches on a multi-domain dataset.

Journal: Scientific reports

Published Date: Jul 1, 2025

Abstract

Sentiment analysis is an essential component of Natural Language Processing (NLP) in resource-abundant languages such as English. Nevertheless, poor-resource languages such as Telugu have experienced limited efforts owing to multiple considerations, such as a scarcity of corpora for training machine learning models and an absence of gold standard datasets for evaluation. The current surge of transformed based models in NLP enables the attainment of exceptional performance in many different tasks. Nevertheless, researchers are increasingly interested in exploring the potential of transformed based models that have been pre-trained in several languages for various natural language processing applications, particularly for languages with limited resources. This research examines the efficacy of four pre-trained transformed based models, specifically IndicBERT, RoBERTa, DeBERTa, and XLM-RoBERTa, for sentence-level sentiment analysis in the Telugu language. Evaluated the performance of all four models using our dataset, "Sentikanna," which consists of numerous domain datasets for the Telugu language. We compared the performance of these models with three different datasets and observed a promising outcome. XLM-RoBERTa achieves a good accuracy of 79.42% for a binary sentiment classification. This work can be considered a reliable standard for sentiment analysis in the Telugu language.

Authors

Kannaiah Chattu

Department of Computer Science & Engineering (AIML), Malla Reddy College of Engineering & Technology, Maisammaguda, Bhadurpalle, Hyderabad, 500100, Telangana, India.
K Adi Narayana Reddy

Department of Computer Science & Engineering, Faculty of Science and Technology (IcfaiTech), ICFAI Foundation for Higher Education (IFHE), Hyderabad, 501203, India.
Sai Babu Veesam

School of Computer Science and Engineering, VIT-AP University, Amaravathi, 522241, India. saibabuv@gmail.com.
Pardha Saradhi Chirumamilla

Senior Software Engineer, Unicon Systems Inc, Tampa, USA.
Vunnava Dinesh Babu

Department of CSE, RV Institute of Technology, Guntur, A.P, India.
Krishna Prakash

Department of Electronics and Communication Engineering, NRI Institute of Technology, Agripalli, Eluru, AP, 521212, India. k_krishna2k7@yahoo.co.in.
Shonak Bansal

Department of Electronics and Communication Engineering, Chandigarh University, Gharuan, Punjab, India. shonakk@gmail.com.
Mohammad Rashed Iqbal Faruque

Space Science Centre (ANGKASA), Institute of Climate Change (IPI), Universiti Kebangsaan Malaysia (UKM), 43600, Bangi, Selangor D. E., Malaysia. rashed@ukm.edu.my.
K S Al-Mugren

Physics Department, Science College, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Keywords

Databases, Factual Humans Language Machine Learning Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40594681)

Sentiment classification for telugu using transformed based approaches on a multi-domain dataset.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Sentiment classification for telugu using transformed based approaches on a multi-domain dataset.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals