Multi-modal sentiment recognition with residual gating network and emotion intensity attention.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Apr 25, 2025

Abstract

Multimodal emotion recognition focuses on the prediction of emotions using text, visual and acoustic modalities, and some results have been generated in this field. Previous approaches fall short in two aspects, one is the processing of complementary information among modalities, the other is how to avoid the long-term dependency and select the most important joint modal features. In this paper, we propose a new multimodal emotion recognition framework MSRG, which consists of feature extraction (FE), emotional intensity attention (EIA), time-step level fusion (TLF), utterance level fusion (ULF), and sentiment inference module (SIM). EIA is divided into adaptive multimodal linear pooling (AMLP) and joint cross-attention fusion (JCAF), where AMLP adopts the adaptive strategy of multimodal fusion to dynamically calculate the adaptive coefficients of three modalities, then performs the pooling operation to obtain joint modal features. JCAF calculates the attention weights and attention features of each modality based on cross-correlation between individual and joint features. TLF performs feature alignment fusion at the time-step level, then uses the residual gating network (RGN) to process the time-step level fused sequences. The obtained time-step level fused features are then input into two fully connected layers and an activation layer to obtain the time-step level emotion intensity. ULF fuses the three modalities' utterance level representations by concatenating them and then inputs the obtained utterance level fused features into a fully connected layer to obtain the utterance level emotion intensity. Finally, both the time-step level emotion intensity and the utterance level emotion intensity are input into SIM to obtain the final emotion prediction results. Experiments demonstrate that MSRG achieves better prediction performance on CMU-MOSI and CMU-MOSEI datasets.

Authors

Yadi Wang

Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, 475004, China; Institute of Data and Knowledge Engineering, School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China; School of Computer Science and Engineering, Southeast University, Nanjing, 211189, China. Electronic address: yadiwang@henu.edu.cn.
Xiaoding Guo

Harbin Institute of Technology, Harbin, China.
Xianhong Hou

School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China. Electronic address: xhhou@henu.edu.cn.
Zhijun Miao

Department of Urology, Suzhou Dushuhu Public Hospital, Suzhou, 215123, China.
Xiaojin Yang

School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China. Electronic address: yxj2001@henu.edu.cn.
Jinkai Guo

School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China. Electronic address: guojk@henu.edu.cn.

Keywords

Algorithms Attention Electroencephalography Emotions Humans Neural Networks, Computer Pattern Recognition, Automated

External Resources

View on PubMed Access via DOI PubMed (40315702)

Multi-modal sentiment recognition with residual gating network and emotion intensity attention.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Multi-modal sentiment recognition with residual gating network and emotion intensity attention.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals