AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination.

Journal: BMC medical education
PMID:

Abstract

BACKGROUND: The creation of high-quality multiple-choice questions (MCQs) is essential for medical education assessments but is resource-intensive and time-consuming when done by human experts. Large language models (LLMs) like ChatGPT-4o offer a promising alternative, but their efficacy remains unclear, particularly in high-stakes exams.

Authors

  • Alex Kk Law
    The Accident and Emergency Medicine Academic Unit (AEMAU), The Chinese University of Hong Kong (CUHK), 2nd Floor, Main Clinical Block and Trauma Centre, Prince of Wales Hospital, Shatin, Hong Kong, China. alexlaw@cuhk.edu.hk.
  • Jerome So
    Department of Accident & Emergency, Tseung Kwan O Hospital, Hong Kong, China.
  • Chun Tat Lui
    Hong Kong College of Emergency Medicine, Hong Kong, China.
  • Yu Fai Choi
    Hong Kong College of Emergency Medicine, Hong Kong, China.
  • Koon Ho Cheung
    Hong Kong College of Emergency Medicine, Hong Kong, China.
  • Kevin Kei-Ching Hung
    The Accident and Emergency Medicine Academic Unit (AEMAU), The Chinese University of Hong Kong (CUHK), 2nd Floor, Main Clinical Block and Trauma Centre, Prince of Wales Hospital, Shatin, Hong Kong, China.
  • Colin Alexander Graham
    The Accident and Emergency Medicine Academic Unit (AEMAU), The Chinese University of Hong Kong (CUHK), 2nd Floor, Main Clinical Block and Trauma Centre, Prince of Wales Hospital, Shatin, Hong Kong, China.