Which curriculum components do medical students find most helpful for evaluating AI outputs?

Journal: BMC medical education
PMID:

Abstract

INTRODUCTION: The risk and opportunity of Large Language Models (LLMs) in medical education both rest in their imitation of human communication. Future doctors working with generative artificial intelligence (AI) need to judge the value of any outputs from LLMs to safely direct the management of patients. We set out to investigate medical students' ability to evaluate LLM responses to clinical vignettes, identify which prior learning they utilised to scrutinise the LLM answers, and assess their awareness of 'clinical prompt engineering'.

Authors

  • William J Waldock
    Imperial College London, London, United Kingdom.
  • George Lam
    Imperial College School of Medicine, Imperial College London, London, UK.
  • Ana Baptista
    Imperial College School of Medicine, Imperial College London, Charing Cross Campus, London, W6 8RP, UK.
  • Risheka Walls
    Imperial College School of Medicine, Imperial College London, Charing Cross Campus, London, W6 8RP, UK.
  • Amir H Sam
    Imperial College School of Medicine, Imperial College London, London, UK.