Emotion recognition for human-computer interaction using high-level descriptors.

Journal: Scientific reports
Published Date:

Abstract

Recent research has focused extensively on employing Deep Learning (DL) techniques, particularly Convolutional Neural Networks (CNN), for Speech Emotion Recognition (SER). This study addresses the burgeoning interest in leveraging DL for SER, specifically focusing on Punjabi language speakers. The paper presents a novel approach to constructing and preprocessing a labeled speech corpus using diverse social media sources. By utilizing spectrograms as the primary feature representation, the proposed algorithm effectively learns discriminative patterns for emotion recognition. The method is evaluated on a custom dataset derived from various Punjabi media sources, including films and web series. Results demonstrate that the proposed approach achieves an accuracy of 69%, surpassing traditional methods like decision trees, Naïve Bayes, and random forests, which achieved accuracies of 49%, 52%, and 61% respectively. Thus, the proposed method improves accuracy in recognizing emotions from Punjabi speech signals.

Authors

  • Chaitanya Singla
    Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.
  • Sukhdev Singh
    Department of Computer Science, Multani Mal Modi College, Patiala, Punjab, India.
  • Preeti Sharma
    Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.
  • Nitin Mittal
    University Centre for Research and Development, Chandigarh University, Mohali, Punjab, 140413, India.
  • Fikreselam Gared
    Faculty of Electrical and Computer Engineering, Bahir Dar University, Bahir Dar, Ethiopia. fikreseafomi@gmail.com.