Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection
Journal:
arXiv
Published Date:
Dec 9, 2024
Abstract
Previous studies have demonstrated that emotional features from a single
acoustic sentiment label can enhance depression diagnosis accuracy.
Additionally, according to the Emotion Context-Insensitivity theory and our
pilot study, individuals with depression might convey negative emotional
content in an unexpectedly calm manner, showing a high degree of inconsistency
in emotional expressions during natural conversations. So far, few studies have
recognized and leveraged the emotional expression inconsistency for depression
detection. In this paper, a multimodal cross-attention method is presented to
capture the Acoustic-Textual Emotional Inconsistency (ATEI) information. This
is achieved by analyzing the intricate local and long-term dependencies of
emotional expressions across acoustic and textual domains, as well as the
mismatch between the emotional content within both domains. A Transformer-based
model is then proposed to integrate this ATEI information with various fusion
strategies for detecting depression. Furthermore, a scaling technique is
employed to adjust the ATEI feature degree during the fusion process, thereby
enhancing the model's ability to discern patients with depression across
varying levels of severity. To best of our knowledge, this work is the first to
incorporate emotional expression inconsistency information into depression
detection. Experimental results on a counseling conversational dataset
illustrate the effectiveness of our method.