GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial.

Journal: Nature medicine
PMID:

Abstract

While large language models (LLMs) have shown promise in diagnostic reasoning, their impact on management reasoning, which involves balancing treatment decisions and testing strategies while managing risk, is unknown. This prospective, randomized, controlled trial assessed whether LLM assistance improves physician performance on open-ended management reasoning tasks compared to conventional resources. From November 2023 to April 2024, 92 practicing physicians were randomized to use either GPT-4 plus conventional resources or conventional resources alone to answer five expert-developed clinical vignettes in a simulated setting. All cases were based on real, de-identified patient encounters, with information revealed sequentially to mirror the nature of clinical environments. The primary outcome was the difference in total score between groups on expert-developed scoring rubrics. Secondary outcomes included domain-specific scores and time spent per case. Physicians using the LLM scored significantly higher compared to those using conventional resources (mean difference = 6.5%, 95% confidence interval (CI) = 2.7 to 10.2, P < 0.001). LLM users spent more time per case (mean difference = 119.3 s, 95% CI = 17.4 to 221.2, P = 0.02). There was no significant difference between LLM-augmented physicians and LLM alone (-0.9%, 95% CI = -9.0 to 7.2, P = 0.8). LLM assistance can improve physician management reasoning in complex clinical vignettes compared to conventional resources and should be validated in real clinical practice. ClinicalTrials.gov registration: NCT06208423 .

Authors

  • Ethan Goh
    Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
  • Robert J Gallo
    Center for Innovation to Implementation, VA Palo Alto Health Care System, Palo Alto, CA, USA.
  • Eric Strong
    Stanford University School of Medicine, Stanford, CA, USA.
  • Yingjie Weng
    Quantitative Sciences Unit, Stanford University School of Medicine, Stanford, CA, USA.
  • Hannah Kerman
    Division of General Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA.
  • Jason A Freed
    Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Joséphine A Cool
    Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Zahir Kanjee
    Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Kathleen P Lane
    Division of Hospital Medicine, University of Minnesota Medical School, Minneapolis, MN, USA.
  • Andrew S Parsons
    Division of Hospital Medicine, University of Virginia School of Medicine, Charlottesville, VA, USA.
  • Neera Ahuja
    Stanford University School of Medicine, Stanford, CA, USA.
  • Eric Horvitz
    Microsoft.
  • Daniel Yang
    Kaiser Permanente, Oakland, CA, USA.
  • Arnold Milstein
    Stanford Clinical Excellence Research Center, Stanford University, Stanford, CA, USA.
  • Andrew P J Olson
    Division of Hospital Medicine, University of Minnesota Medical School, Minneapolis, MN, USA.
  • Jason Hom
    Stanford University School of Medicine, Stanford, CA, USA.
  • Jonathan H Chen
    Stanford Center for Biomedical Informatics Research, Stanford, CA.
  • Adam Rodman
    Beth Israel Deaconess Medical Center, Boston, MA, USA.