A Sentiment Pre-trained Text-Guided Multimodal Cross-Attention Transformer for Improved Depression Detection.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PMID:

Abstract

Depression is a widespread mental health issue requiring efficient automated detection methods. Traditional single-modality approaches are less effective due to the disorder's complexity, leading to a focus on multimodal analysis. Recent advancements include transformer-based fusion methods, yet their application in depression detection is often limited by the dominant text modality. To address this, we propose the Text-Guided Multimodal Cross-Attention Transformer, enhancing cross-modal interactions between text, audio, and video for more effective depression detection. Our approach uniquely pre-trains encoders on a large sentiment dataset to better capture emotion-related features crucial for identifying depression-related sentiment changes. Our method demonstrates superior performance on the AVEC2019 benchmark, outperforming current state-of-the-art depression detection techniques.

Authors

  • Shiyu Teng
  • Shurong Chai
    College of Information Science and Engineering, Ritsumeikan University, Kusatsushi 5250058, Shiga, Japan.
  • Jiaqing Liu
    College of Information Science and Engineering, Ritsumeikan University, Kusatsushi 5250058, Shiga, Japan.
  • Tomoko Tateyama
  • Lanfen Lin
    State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310027, China.
  • Yen-Wei Chen