Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.

Journal: Schizophrenia (Heidelberg, Germany)

Published Date: Jul 15, 2025

Abstract

Mental illnesses often manifest through behavioral changes, with speech serving as a key medium for expressing thoughts and emotions. The use of computational linguistics on speech data in mental illnesses is a promising approach to uncover objective biomarkers for the early detection of mental illnesses. This study analyzed speech transcripts from 80 youths at ultra-high risk of psychosis (UHR) and 329 healthy controls, examining text features such as sentiment variability, cohesion, lexical sophistication, morphology, syntactic sophistication, and lexical diversity. Factor analysis revealed five key linguistic themes: Sentiment Intensity and Variability, Linguistic Register Alignment, Phonographic Uniqueness and Recognizability, Morphological Complexity and Imageability, and Lexical Richness and Typicalness. Regression analysis indicated UHR speech is characterized by diminished sentiment variability (β = -0.07), deviation from linguistic registers (β = -0.16), fewer phonographic neighbors (β = -0.11), lower morphological complexity (β = -0.36), and more predictable lexical structures (β = 0.05). Optimized machine learning (ML) models trained on Boruta-selected features achieved a mean AUC of 0.70. Our findings highlight the potential of sentiment and linguistic analyses in speech for training ML models to aid in early detection and monitoring of mental health conditions.

Authors

Jordon Junyang Kho

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
Shangzheng Song

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
Samuel Ming Xuan Tan

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
Nur Hikmah Fitriyah

School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
Matheus Calvin Lokadjaja

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
Jie Yin Yee

North Region, Institute of Mental Health, Singapore, Singapore.
Zixu Yang

Institute of Mental Health, Singapore, Singapore.
Eric Yu Hai Chen

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
Jimmy Lee

Institute of Mental Health, Singapore, Singapore.
Wilson Wen Bin Goh

School of Biological Sciences, Nanyang Technological University, Singapore 637551, Republic of Singapore. Electronic address: wilsongoh@ntu.edu.sg.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40664678)

Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals