AI-Simulated Clinical Consultations: Assessing the Potential of ChatGPT to Support Medical Training

Journal: medRxiv
Published Date:

Abstract

Simulated medical scenarios are useful for evaluating and developing clinical competencies but scheduling them is expensive and time-consuming. Large language models (LLMs) show promise in role-playing tasks. We investigated the fidelity with which ChatGPT can mimic patients, clinicians and examiners in educational settings. To determine the realism with which ChatGPT can portray patient, doctor and examiner roles, and the utility of these agents in clinical education. We selected four paediatric scenarios from mock OSCEs and set up separate patient, doctor and examiner ChatGPT agents for each. The patient and doctor agents conversed with each other in written format. The examiner agent marked the doctor agent based on this conversation. Patients and clinicians familiar with the OSCE assessed the dialogues. The patient agent was judged to be true to character most of the time and good at expressing emotion. The doctor agent was reported to be an effective communicator but occasionally used jargon. Both agents tended to produce repetitive responses which undermined realism. The examiner agent had good correlation with human clinicians. There was moderate support for using the simulated interactions for educational purposes. Although the realism of the agents can be improved, ChatGPT can generate plausible proxies of participants in medical scenarios and could be useful for complementing standardised patient (SP)-based training. LLM-based agents show promise for portraying clinical roles and supporting simulation-based learning. Doctor agents provide correct diagnoses most of the time, while patient agents can accurately relay role information such as medical history or symptoms. There is scope for improvement in the realism and authenticity of the conversations produced by GPT patient and doctor agents. Notable issues included a tendency to produce repetitive and verbose responses, and an inability to accurately convey the hesitation shown by real patients. Disparities observed between (human) patient and clinician assessment for the GPT agents suggest that diverse viewpoints are needed to fully capture the experiential learning associated with clinical communication. How this study might affect research, practice or policy Low fidelity of GPT simulations for difficult or challenging medical scenarios necessitates human oversight and correction for AI deployed in educational settings. The impact of AI on medical education is likely to increase in the future, which necessitates promoting AI literacy among educators and students.

Authors

  • Arpita Saggar; Vania Dimitrova; Duygu Sarikaya; David C. Hogg; Jonathan C. Darling