AI-Driven Speech Analysis for Mental Health Prediction in Individuals with Voice Disorders.

Journal: Journal of voice : official journal of the Voice Foundation
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: Mental health disorders are common among individuals with voice disorders, yet applications of AI-driven speech analysis in this population remain limited. It is also unclear whether the same acoustic features contribute to mental health prediction across different etiologies. We aimed to develop an interpretable, fairness-aware artificial intelligence and machine learning (AI/ML) model to identify individuals at risk of mental health disorders and further inform subsequent mental health screening and interventions. METHODS: This observational study analyzed a publicly available PhysioNet voice dataset. Demographic characteristics and static acoustic features derived from voice recordings were used as predictors of mental health disorders. We included individuals with voice disorders (including primary voice disorders and neurological/neurodegenerative conditions with vocal manifestations). Among all acoustic features, we further mapped them to three prespecified, clinician-facing domains: voice stability, voice clarity, and speech prosody (suprasegmental features, including pitch variability, intensity, and speech rate). Five algorithms were evaluated, and the model with the most balanced performance was selected based on balanced F1 scores. Interpretability was assessed using feature importance, SHapley Additive exPlanations beeswarm plots, and partial dependent plots. Gender fairness was evaluated at the group level and via individual-level counterfactual analyses. RESULTS: We included 328 participants with voice disorders, of whom 108 (33%) had a clinician-diagnosed mental health disorder. Random Forest achieved the most balanced performance: in the primary voice-disorder subgroup, recall = 0.77, precision = 0.72, and area under the receiver operating characteristic curve (AUROC) = 0.83; in the neurological/neurodegenerative subgroup, recall = 0.79, precision = 0.81, and AUROC = 0.84. Feature attributions differed by etiology: voice stability was most informative in primary voice disorders, whereas speech prosody was dominant in neurological or neurodegenerative conditions. Group-level gender fairness analyses and individual-level counterfactual tests both indicated invariance in gender for predictions of mental health disorders. CONCLUSIONS: Mental health disorders occur more frequently among individuals with voice disorders. Interpretable, fairness-aware AI/ML models can predict mental health disorders from acoustic features in this population. However, the acoustic feature domains vary depending on the underlying etiology. These findings support the feasibility of ethical, explainable, and clinically relevant voice-based screening tools tailored to patients with communication challenges, rather than serving as standalone diagnostic instruments.

Authors

Keywords

No keywords available for this article.