Evaluating large language models for predicting psychiatric acute readmissions from clinical notes of population-based EHR
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Psychiatric patients often have complex symptoms and anamneses recorded as unstructured clinical notes. Large language models (LLM) now enable large-scale utilization of text data; however, there is a current lack of LLMs specialized for psychiatric clinical data, as well as non-English data, haltering the application of LLMs across diverse clinical domains and countries. We present PsyRoBERTa: the first LLM specialized for clinical psychiatry, using population-based data with the currently largest collection of clinical notes of psychiatric relevancy (∼44 million notes) covering the eastern half of Denmark. The model was evaluated against three publicly available models, pretrained on either public general- or medical-domain text, and a baseline logistic regression classifier. Through extensive evaluations, we investigated the effect of domain-specific pretraining on predicting acute readmissions in psychiatric hospitals, explored important features, and reflected on (dis)advantages of LLMs. PsyRoBERTa succeeded in outperforming prior models (AUC=0.74), capturing information aligning with clinical practice, and additionally recognizing psychiatric diagnoses (AUC=0.85). This demonstrates the importance of domain-pretraining and the potential of LLMs to leverage psychiatric clinical notes for enhancing prediction of psychiatric outcomes.