REVIEW ARTICLE: A Performance Benchmarking Review of Transformers for Speaker-Independent Speech Emotion Recognition.

Journal: International journal of neural systems

Published Date: Jul 29, 2025

Abstract

Speech Emotion Recognition (SER) is becoming a key element of speech-based human-computer interfaces, endowing them with some form of empathy towards the emotional status of the human. Transformers have become a central Deep Learning (DL) architecture in natural language processing and signal processing, recently including audio signals for Automatic Speech Recognition (ASR) and SER. A central question addressed in this paper is the achievement of speaker-independent SER systems, i.e. systems that perform independently of a specific training set, enabling their deployment in real-world situations by overcoming the typical limitations of laboratory environments. This paper presents a comprehensive performance evaluation review of transformer architectures that have been proposed to deal with the SER task, carrying out an independent validation at different levels over the most relevant publicly available datasets for validation of SER models. The comprehensive experimental design implemented in this paper provides an accurate picture of the performance achieved by current state-of-the-art transformer models in speaker-independent SER. We have found that most experimental instances reach accuracies below 40% when a model is trained on a dataset and tested on a different one. A speaker-independent evaluation combining up to five datasets and testing on a different one achieves up to 58.85% accuracy. In conclusion, the SER results improved with the aggregation of datasets, indicating that model generalization can be enhanced by extracting data from diverse datasets.

Authors

Francisco Portal

Department of Artificial Intelligence, Universidad Politécnica de Madrid, Madrid, Spain.
Javier De Lope

Department of Artificial Intelligence, Universidad Politécnica de Madrid (UPM), Madrid, Spain.
Manuel Graña

Computational Intelligence Group, Faculty of Informatics, Basque Country University (UPV/EHU), Paseo Manuel de Lardizabal 1, 20018 San Sebastian, Spain; Department of Computer Science and Artificial Intelligence, Faculty of Informatics, Basque Country University (UPV/EHU), Paseo Manuel de Lardizabal 1, 20018 San Sebastian, Spain; ENGINE Centre, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40726155)

REVIEW ARTICLE: A Performance Benchmarking Review of Transformers for Speaker-Independent Speech Emotion Recognition.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

REVIEW ARTICLE: A Performance Benchmarking Review of Transformers for Speaker-Independent Speech Emotion Recognition.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals