An Ensemble Approach Integrating Retrieval-Augmented Large Language Models and Boosting Algorithms for Enhanced Catatonia Phenotyping.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

A critical first step in using large-scale data to study catatonia is the development of precise phenotyping algorithms that can identify instances of the condition. In this work, we present an ensemble approach that combines retrieval-augmented generation (RAG) large language models (LLMs) with boosting algorithms to phenotype catatonia from the electronic health records (EHRs) of 3.5 million individuals seen at a large academic medical center from 2006 to 2017. Although the ensemble model achieved an AUROC of 0.709, slightly lower than the boosting algorithm alone (AUROC = 0.713), the inclusion of the RAG-LLM component provides enhanced interpretability. In particular, the RAG-LLM can identify contextually complex clinical features, such as those described by the Bush-Francis Catatonia Rating Scale, directly from clinical notes. These results highlight the potential of RAG-LLMs to capture nuanced contextual cues and fulfill complex catatonia phenotype definitions, even when overall classification performance is comparable to more traditional machine learning methods.

Authors

Yubo Feng

Vanderbilt University, Nashville, Tennessee, USA.
Ruiyan Ma

Vanderbilt University, Nashville, TN, USA.
Xinmeng Zhang

Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
You Chen

Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.

Keywords

Algorithms Catatonia Electronic Health Records Humans Large Language Models Machine Learning Natural Language Processing Phenotype

External Resources

View on PubMed Access via DOI PubMed (40775938)

An Ensemble Approach Integrating Retrieval-Augmented Large Language Models and Boosting Algorithms for Enhanced Catatonia Phenotyping.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

An Ensemble Approach Integrating Retrieval-Augmented Large Language Models and Boosting Algorithms for Enhanced Catatonia Phenotyping.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals