An Ensemble Approach Integrating Retrieval-Augmented Large Language Models and Boosting Algorithms for Enhanced Catatonia Phenotyping.

Journal: Studies in health technology and informatics
Published Date:

Abstract

A critical first step in using large-scale data to study catatonia is the development of precise phenotyping algorithms that can identify instances of the condition. In this work, we present an ensemble approach that combines retrieval-augmented generation (RAG) large language models (LLMs) with boosting algorithms to phenotype catatonia from the electronic health records (EHRs) of 3.5 million individuals seen at a large academic medical center from 2006 to 2017. Although the ensemble model achieved an AUROC of 0.709, slightly lower than the boosting algorithm alone (AUROC = 0.713), the inclusion of the RAG-LLM component provides enhanced interpretability. In particular, the RAG-LLM can identify contextually complex clinical features, such as those described by the Bush-Francis Catatonia Rating Scale, directly from clinical notes. These results highlight the potential of RAG-LLMs to capture nuanced contextual cues and fulfill complex catatonia phenotype definitions, even when overall classification performance is comparable to more traditional machine learning methods.

Authors

  • Yubo Feng
    Vanderbilt University, Nashville, Tennessee, USA.
  • Ruiyan Ma
    Vanderbilt University, Nashville, TN, USA.
  • Xinmeng Zhang
    Department of Computer Science, Vanderbilt University, Nashville, TN 37212, United States.
  • You Chen
    Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.