Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers

Journal: medRxiv

Published Date: Feb 26, 2026

Abstract

Psychiatric disorders are fundamentally challenged by symptom heterogeneity, high comorbidity, and the absence of objective biomarkers, which together result in substantial variability in clinical assessment and treatment selection. Patient-generated language captures rich information about subjective experience and symptom severity, which can be systematically encoded and analyzed using computational models, making it a scalable signal for psychiatric assessment. We compare two approaches: (i) a domain-specialized transformer fine-tuned on clinical language, based on the Bio-ClinicalBERT encoder architecture, and (ii) a large-scale instruction-tuned generalist encoder (Instructor-XL) used as a frozen feature extractor with a shallow classification head. A corpus of = 151,228 de-identified texts was compiled from five public sources, covering four psychiatric phenotypes: anxiety, depression, schizophrenia, and suicidal intention. Models were evaluated using stratified 10-fold cross-validation with cost-sensitive training, prioritizing imbalance-aware metrics, including Macro-1 and Matthews Correlation Coefficient (MCC), over accuracy. Bio-ClinicalBERT achieved superior overall performance (Macro-1 = 0.78, MCC = 0.6752), indicating more reliable separation of diagnostically overlapping affective categories. In contrast, Instructor-XL achieved its highest class-specific performance for schizophrenia (1 = 0.798). Explainability analyses suggest that the domain-specialized model places greater weight on clinically relevant terms, whereas the generalist model relies on a broader set of lexical features.

Authors

Varone
G.; Kumar
P.; Brown
J.; Boulila
W.

External Resources

View on medRxiv Access via DOI

Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Disentangling Symptom Heterogeneity in Large-Scale Psychiatric Text: Domain-Adapted vs. Instruction-Tuned Transformers

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals