Artificial intelligence-assisted feedback in pharmacology education: a pilot evaluation of a custom generative model.

Journal: BMC medical education
Published Date:

Abstract

BACKGROUND: Timely, individualised feedback is central to effective learning in medical education but remains resource intensive, particularly when based on short answer questions (SAQs). Large language models (LLMs) offer potential to support feedback provision, yet prospective evaluations of their accuracy and educational value remain limited. This pilot study evaluated the feasibility, accuracy and student perceptions of AI-generated pharmacology SAQ feedback. METHODS: A prospective pilot study was conducted within a UK Physician Associate MSc programme between March and October 2025. Students were invited to complete voluntary four-question formative pharmacology SAQs. A bespoke custom Generative Pre-trained Transformer (GPT), built using GPT-4 family LLM, guided by a standardised rubric and references, generated structured feedback. All outputs underwent faculty moderation prior to release. Student perceptions were assessed using a 5-point Likert-scale survey. 25% of student responses were double marked and independently reviewed in a blinded comparison of AI and faculty feedback. RESULTS: Twenty students submitted complete responses, generating feedback for a total of 80 questions. Mean AI feedback generation time was 34 s per quiz, compared with 508 s for faculty marking, representing a 15-fold reduction. Twelve of 20 (60%) feedback files required no modification before release, while eight required amendments, including four major corrections, even with faculty modification there was substantial efficiency gains from AI generated feedback. Eleven students (55%) completed the survey, reporting favourable perceptions of clarity, actionability, confidence and overall usefulness (median 4 out of 5) for the moderated feedback. In an exploratory blinded review of 5 double-marked submissions, no statistically significant differences were detected between AI-generated and faculty-generated feedback across rated domains (p > 0.23 for all domains). CONCLUSIONS: A custom GPT-based LLM delivered rapid, structured pharmacology feedback with substantial efficiency gains and positive student perceptions. However, clinically important errors occurred, necessitating consistent faculty oversight. Generative models may augment formative assessment in medical education, but rigorous calibration, moderation and ethical safeguards remain essential for safe implementation.

Authors

Keywords

No keywords available for this article.