Assessment of Large Language Model Performance on Medical School Essay-Style Concept Appraisal Questions: Exploratory Study.
Journal:
JMIR medical education
Published Date:
Jun 16, 2025
Abstract
Bing Chat (subsequently renamed Microsoft Copilot)-a ChatGPT 4.0-based large language model-demonstrated comparable performance to medical students in answering essay-style concept appraisals, while assessors struggled to differentiate artificial intelligence (AI) responses from human responses. These results highlight the need to prepare students and educators for a future world of AI by fostering reflective learning practices and critical thinking.