A Sentiment Pre-trained Text-Guided Multimodal Cross-Attention Transformer for Improved Depression Detection.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

PMID: 40040039

Abstract

Depression is a widespread mental health issue requiring efficient automated detection methods. Traditional single-modality approaches are less effective due to the disorder's complexity, leading to a focus on multimodal analysis. Recent advancements include transformer-based fusion methods, yet their application in depression detection is often limited by the dominant text modality. To address this, we propose the Text-Guided Multimodal Cross-Attention Transformer, enhancing cross-modal interactions between text, audio, and video for more effective depression detection. Our approach uniquely pre-trains encoders on a large sentiment dataset to better capture emotion-related features crucial for identifying depression-related sentiment changes. Our method demonstrates superior performance on the AVEC2019 benchmark, outperforming current state-of-the-art depression detection techniques.

Authors

Shiyu Teng
Shurong Chai

College of Information Science and Engineering, Ritsumeikan University, Kusatsushi 5250058, Shiga, Japan.
Jiaqing Liu

College of Information Science and Engineering, Ritsumeikan University, Kusatsushi 5250058, Shiga, Japan.
Tomoko Tateyama
Lanfen Lin

State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310027, China.
Yen-Wei Chen

Keywords

Algorithms Attention Depression Emotions Humans Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40040039)

A Sentiment Pre-trained Text-Guided Multimodal Cross-Attention Transformer for Improved Depression Detection.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals