GPT-4 versus human authors in clinically complex MCQ creation: A blinded analysis of item quality.

Journal: Medical teacher

Published Date: May 29, 2025

Abstract

PURPOSE: To compare the structural quality of multiple choice questions (MCQs) generated by a large language model, a type of artificial intelligence (AI), GPT-4, against human-authored items at both novice and expert level.

Authors

Hannah Wu

Adelaide Medical School, University of Adelaide, Adelaide, Australia.
Toby Zerner

Faculty of Health and Medical Sciences, University of Adelaide, Australia (T.Z., T.K., J.J.).
Daniel Lee

Medical College of Georgia, Augusta University, 1120 15th St. Augusta, GA 30912, USA.
Stefan Court-Kowalski

Adelaide Medical School, University of Adelaide, Adelaide, Australia.
Peter Devitt

eMedici, Adelaide, Australia.
Edward Palmer

Bloomsbury Institute of Intensive Care Medicine, University College London, London, UK. edward.palmer@ucl.ac.uk.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40439661)

GPT-4 versus human authors in clinically complex MCQ creation: A blinded analysis of item quality.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

GPT-4 versus human authors in clinically complex MCQ creation: A blinded analysis of item quality.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals