Cross-modal Context Fusion and Adaptive Graph Convolutional Network for Multimodal Conversational Emotion Recognition
Journal:
arXiv
Published Date:
Jan 25, 2025
Abstract
Emotion recognition has a wide range of applications in human-computer
interaction, marketing, healthcare, and other fields. In recent years, the
development of deep learning technology has provided new methods for emotion
recognition. Prior to this, many emotion recognition methods have been
proposed, including multimodal emotion recognition methods, but these methods
ignore the mutual interference between different input modalities and pay
little attention to the directional dialogue between speakers. Therefore, this
article proposes a new multimodal emotion recognition method, including a cross
modal context fusion module, an adaptive graph convolutional encoding module,
and an emotion classification module. The cross modal context module includes a
cross modal alignment module and a context fusion module, which are used to
reduce the noise introduced by mutual interference between different input
modalities. The adaptive graph convolution module constructs a dialogue
relationship graph for extracting dependencies and self dependencies between
speakers. Our model has surpassed some state-of-the-art methods on publicly
available benchmark datasets and achieved high recognition accuracy.