Depression screening with textual and audio features based on large language models and machine learning.
Journal:
Journal of affective disorders
Published Date:
Nov 16, 2025
Abstract
BACKGROUND: Depression is a complex disorder that cannot be fully screened by textual features alone, as audio features capture additional psychomotor and affective changes. This study integrates textual and audio features for depression screening and compares the performance of various machine learning models. METHODS: This study used a large-scale, multimodal psychology dataset of 1275 participants (707 males, 568 females; aged 12-16 years) that integrates PHQ-9 scores, textual interview responses, and mel-spectrograms derived from audio recordings. Textual features were calculated using suicide risk scores from the Chinese Suicide Dictionary (CSD), emotional polarity probabilities, and depression severity probabilities generated by large language models (LLMs). For audio data, we estimated the combination of emotion status by the frequency (ratio) of eight emotions, which applied a fine-tuned U-Net model with mel spectrograms, mel-frequency cepstral coefficients (MFCCs), and chroma features. Finally, these features were combined and evaluated with five machine learning models using eight metrics to identify the best-performing model. RESULTS: Among the five machine learning methods, multimodal fusion outperformed unimodal approaches (text-only and audio-only) with the lowest MAE and RMSE. The RFR model showed the best performance for depression prediction (Accuracy = 0.98 and Precision = 0.98) with the combination of prompt3 from LLMs. The most important features for depression prediction were depression severity, negative and positive emotional polarity, and suicide risk from textual features, and emotional features (happy, angry, neutral, and surprise) from audio features. CONCLUSIONS: Combining audio and textual features improved depression screening accuracy. Future research could include facial expressions and physiological indicators to further enhance screening performance.
Authors
Keywords
No keywords available for this article.