Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data.
Journal:
AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:
May 22, 2025
Abstract
Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI's GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses' clinical reasoning, and verification of LLM-based information summarization does not burden end-users.