Generative AI Demonstrated Difficulty Reasoning on Nursing Flowsheet Data.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:

Abstract

Excessive documentation burden is linked to clinician burnout, thus motivating efforts to reduce burden. Generative artificial intelligence (AI) poses opportunities for burden reduction but requires rigorous assessment. We evaluated the ability of a large language model (LLM) (OpenAI's GPT-4) to interpret various intervention-response relationships presented on nursing flowsheets, assessing performance using MUC-5 evaluation metrics, and compared its assessments to those of nurse expert evaluators. ChatGPT correctly assessed 3 of 14 clinical scenarios, and partially correctly assessed 6 of 14, frequently omitting data from its reasoning. Nurse expert evaluators correctly assessed all relationships and provided additional language reflective of standard nursing practice beyond the intervention-response relationships evidenced in nursing flowsheets. Future work should ensure the training data used for electronic health record (EHR)-integrated LLMs includes all types of narrative nursing documentation that reflect nurses' clinical reasoning, and verification of LLM-based information summarization does not burden end-users.

Authors

  • Courtney J Diamond
    Department of General Internal Medicine, Massachusetts General Hospital, Boston, MA, United States.
  • Jennifer Thate
    Columbia University Department of Biomedical Informatics, New York, NY.
  • Jennifer B Withall
    Columbia University Department of Biomedical Informatics, New York, NY.
  • Rachel Y Lee
    Columbia University School of Nursing, New York, NY.
  • Kenrick Cato
    School of Nursing, Columbia University, New York City, NY, USA.
  • Sarah C Rossetti
    School of Nursing, Columbia University, New York, New York, USA.