Feature and classifier-level domain adaptation in DistilHuBERT for cross-corpus speech emotion recognition.

Journal: Computers in biology and medicine
Published Date:

Abstract

Cross-corpus speech emotion recognition (CCSER) aims to develop robust models capable of accurately identifying a speaker's emotional state across diverse datasets. This task is challenged by variations in dataset characteristics, such as differences in gender distribution and languages. Overcoming these challenges necessitates novel feature representation and classification strategies. This study utilizes self-supervised speech representations generated by the DistilHuBERT model and proposes domain adaptation methods at both the Feature-level (FDA) and Classifier-level (CDA) for CCSER. This paper proposes four FDA methods for adapting Cross-Corpus Speech Emotion Recognition to the target dataset. The first method integrates the CNN encoder of DistilHuBERT into a Siamese network with a contrastive loss function to learn a new feature space that minimizes domain shift across datasets. The second method transfers the pre-trained CNN encoder from the first FDA method to the DistilHuBERT model. The third method builds upon the second approach; however, instead of transferring the CNN encoder, it fine-tunes DistilHuBERT using the target dataset. The last method fine-tunes both the CNN and transformer layers of DistilHuBERT for feature adaptation. Our emotion classifier incorporates attentive statistical pooling and Maxout (as a dimension reduction block), followed by a softmax layer. We implement CDA by updating the classifier by utilizing parts of the target dataset.Our datasets comprise EMODB, IEMOCAP, and ShEMO. Our fourth proposed FDA method, when combined with CDA, achieves the highest accuracy among our methods. It reaches 92.01 % on the EMODB dataset when using ShEMO as the source.

Authors

  • Niloufar Naeeni
    Computer Engineering Department, K. N. Toosi University of Technology, Shariati Ave., Tehran, Iran. Electronic address: niloufar.naeeni@email.kntu.ac.ir.
  • Babak Nasersharif
    Computer Engineering Department, K. N. Toosi University of Technology, Shariati Ave., Tehran, Iran. Electronic address: bnasersharif@kntu.ac.ir.