A retrieval-augmented generation large language model framework for accurate dementia identification from electronic health records

Journal: medRxiv
Published Date:

Abstract

Objective Accurate and scalable disease phenotyping from electronic health records (EHRs) is foundational for predictive modeling and precision medicine. Traditional rule- and keyword-based approaches are limited by inconsistent documentation and inability to capture clinical nuance. We aim to evaluate whether large language models (LLMs) can overcome these limitations to improve dementia phenotyping from real-world EHR data. Methods We developed and evaluated a framework integrating large language models and retrieval-augmented generation (RAG) to improve dementia identification from EHRs. Using Mass General Brigham EHR data, we identified a cohort of potential dementia cases and established gold-standard labels through chart review. Among 623 candidate cases, we compared rule-based classification, keyword-filtered LLMs, and RAG-based LLMs. Results The RAG-based classifier achieved the highest performance (F1=0.933, sensitivity=91.1%, PPV=95.5%) compared to rule-based (F1=0.823, sensitivity=81.1%, PPV=83.5%) and keyword-filtered LLM (F1=0.903, sensitivity=91.7%, PPV=88.6%). Error analysis revealed that structured-code dependence contributed to false positives, whereas unrecognized contextual cues in notes drove false negatives. Conclusion This framework demonstrates how RAG-based LLMs can produce reliable, context-aware dementia phenotypes to support predictive modeling, early detection, and precision care strategies across real-world populations.

Authors

  • Wang
  • L.; Liu
  • B.; Yang
  • R.; Chuang
  • Y.-W.; Estiri
  • H.; Murphy
  • S.; Zhou
  • L.; Marshall
  • G.