Development and Comparative Evaluation of Three Artificial Intelligence Models (NLP, LLM, JEPA) for Predicting Triage in Emergency Departments: A 7-Month Retrospective Proof-of-Concept
Journal:
arXiv
Published Date:
Jul 1, 2025
Abstract
Triage errors, including undertriage and overtriage, are persistent
challenges in emergency departments (EDs). With increasing patient influx and
staff shortages, the integration of artificial intelligence (AI) into triage
protocols has gained attention. This study compares the performance of three AI
models [Natural Language Processing (NLP), Large Language Models (LLM), and
Joint Embedding Predictive Architecture (JEPA)] in predicting triage outcomes
against the FRENCH scale and clinical practice.We conducted a retrospective
analysis of a prospectively recruited cohort gathering adult patient triage
data over a 7-month period at the Roger Salengro Hospital ED (Lille, France).
Three AI models were trained and validated : (1) TRIAGEMASTER (NLP), (2)
URGENTIAPARSE (LLM), and (3) EMERGINET (JEPA). Data included demographic
details, verbatim chief complaints, vital signs, and triage outcomes based on
the FRENCH scale and GEMSA coding. The primary outcome was the concordance of
AI-predicted triage level with the FRENCH gold-standard. It was assessed thanks
to various indicators : F1-Score, Weighted Kappa, Spearman, MAE, RMSE. The LLM
model (URGENTIAPARSE) showed higher accuracy (composite score: 2.514) compared
to JEPA (EMERGINET, 0.438) and NLP (TRIAGEMASTER, -3.511), outperforming nurse
triage (-4.343). Secondary analyses highlighted the effectiveness of
URGENTIAPARSE in predicting hospitalization needs (GEMSA) and its robustness
with structured data versus raw transcripts (either for GEMSA prediction or for
FRENCH prediction). LLM architecture, through abstraction of patient
representations, offers the most accurate triage predictions among tested
models. Integrating AI into ED workflows could enhance patient safety and
operational efficiency, though integration into clinical workflows requires
addressing model limitations and ensuring ethical transparency.