Evaluating deep learning architectures for Speech Emotion Recognition.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Mar 21, 2017

Abstract

Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances.

Authors

Haytham M Fayek

School of Engineering, RMIT University, Melbourne VIC 3001, Australia. Electronic address: haytham.fayek@ieee.org.
Margaret Lech

School of Engineering, RMIT University, Melbourne VIC 3001, Australia. Electronic address: margaret.lech@rmit.edu.au.
Lawrence Cavedon

School of Science, RMIT University, Melbourne, Australia.

Keywords

Emotions Machine Learning Neural Networks, Computer Speech Recognition Software

External Resources

View on PubMed Access via DOI PubMed (28396068)

Evaluating deep learning architectures for Speech Emotion Recognition.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Evaluating deep learning architectures for Speech Emotion Recognition.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals