Evaluating Large Language Models for Translating Multimodal Phenotype Documentations into Executable EHR Phenotyping Algorithms

Journal: medRxiv
Published Date:

Abstract

Research applications of electronic health record (EHR) phenotypes require translating clinical definitions into executable EHR database queries, a labor-intensive process. We evaluated two frontier large language models across five phenotypes and three documentation modalities. Both models captured high-level logic from structured text but degraded markedly with diagram-only input. Error analysis revealed seven failure categories. Documentation, rather than model capability, was the primary bottleneck, reinforcing the need for standardization and expert oversight.

Authors

  • Yan
  • C.; Xin
  • Y.; Su
  • W.-C.; Gangireddy
  • S.; Durbhakula
  • S.; Bruehl
  • S. P.; Dickson
  • A. L.; Li
  • L.; Feng
  • Q.; Malin
  • B. A.; Derr
  • T.; Wei
  • W.-Q.