Educational Measurement - AI Medical Compendium

Evaluating Large Language Models on American Board of Anesthesiology-style Anesthesiology Questions: Accuracy, Domain Consistency, and Clinical Implications.

Journal of cardiothoracic and vascular anesthesia Sep 1, 2025

Recent advances in large language models (LLMs) have led to growing interest in their potential applications in medical education and clinical practice. This study evaluated whether five widely used and highly developed LLMs-ChatGPT-4, Gemini, Claude...

Clinical Competence Educational Measurement Humans Language Large Language Models Specialty Boards United States Anesthesiology

View on PubMed DOI

LLM-Generated multiple choice practice quizzes for preclinical medical students.

Advances in physiology education Sep 1, 2025

Multiple choice questions (MCQs) are frequently used in medical education for assessment. Automated generation of MCQs in board-exam format could potentially save significant effort for faculty and generate a wider set of practice materials for stude...

Education, Medical, Undergraduate Humans Students, Medical Educational Measurement Feasibility Studies

View on PubMed DOI

Comparison of a generative large language model to pharmacy student performance on therapeutics examinations.

Currents in pharmacy teaching & learning Sep 1, 2025

OBJECTIVE: To compare the performance of a generative language model (ChatGPT-3.5) to pharmacy students on therapeutics examinations.

Humans Education, Pharmacy Large Language Models Language Students, Pharmacy Educational Measurement

View on PubMed DOI

Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.

Turkish journal of ophthalmology Aug 21, 2025

OBJECTIVES: To evaluate the response and interpretative capabilities of two pioneering artificial intelligence (AI)-based large language model (LLM) platforms in addressing ophthalmology-related multiple-choice questions (MCQs) from Turkish Medical S...

Artificial Intelligence Clinical Competence Humans Ophthalmology Turkey Education, Medical, Graduate Educational Measurement Generative Artificial Intelligence

View on PubMed DOI

Evaluating the Performance of Reasoning Large Language Models on Japanese Radiology Board Examination Questions.

Academic radiology Aug 1, 2025

RATIONALE AND OBJECTIVES: This study evaluates the performance, cost, and processing time of OpenAI's reasoning large language models (LLMs) (o1-preview, o1-mini) and their base models (GPT-4o, GPT-4o-mini) on Japanese radiology board examination que...

Educational Measurement Radiology Large Language Models Specialty Boards Japan Language

View on PubMed DOI

AI in action: Changes to student perceptions when using generative artificial intelligence for the creation of a multimedia project-based assessment.

European journal of pharmacology Jul 5, 2025

INTRODUCTION: New modes of assessments are needed to evaluate of the authenticity of student learning in an artificial intelligence (AI) world. In mid-2023, we piloted a new assessment type; a collaborative group multimedia assessment with AI allowan...

Artificial Intelligence Female Multimedia Students Male Humans Students, Pharmacy Educational Measurement Generative Artificial Intelligence Perception

View on PubMed DOI

Artificial intelligence (AI) performance on pharmacy skills laboratory course assignments.

Currents in pharmacy teaching & learning Jul 1, 2025

OBJECTIVE: To compare pharmacy student scores to scores of artificial intelligence (AI)-generated results of three common platforms on pharmacy skills laboratory assignments.

Educational Measurement Curriculum Humans Reproducibility of Results Students, Pharmacy Artificial Intelligence Clinical Competence Education, Pharmacy

View on PubMed DOI

Comparison of ChatGPT-4, Copilot, Bard and Gemini Ultra on an Otolaryngology Question Bank.

Clinical otolaryngology : official journal of ENT-UK ; official journal of Netherlands Society for Oto-Rhino-Laryngology & Cervico-Facial Surgery Jul 1, 2025

OBJECTIVE: To compare the performance of Google Bard, Microsoft Copilot, GPT-4 with vision (GPT-4) and Gemini Ultra on the OTO Chautauqua, a student-created, faculty-reviewed otolaryngology question bank.

Otolaryngology Generative Artificial Intelligence Education, Medical Humans Surveys and Questionnaires Educational Measurement

View on PubMed DOI

Analysis of ChatGPT-4's performance on ophthalmology questions from the MIR exam.

Archivos de la Sociedad Espanola de Oftalmologia Jun 1, 2025

PURPOSE: To evaluate the performance of ChatGPT in solving clinical scenarios in ophthalmology, specifically questions from the specialty exams for Resident Medical Interns (MIR).

Ophthalmology Educational Measurement Humans Generative Artificial Intelligence Internship and Residency Cross-Sectional Studies

View on PubMed DOI

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).

The surgeon : journal of the Royal Colleges of Surgeons of Edinburgh and Ireland Jun 1, 2025

INTRODUCTION: Recent advancements in large language model (LLM) artificial intelligence (AI) systems, like ChatGPT, have showcased ability in answering standardized examination questions, but their performance is variable. The goal of this study was ...

Orthopedics Education, Medical, Graduate Internship and Residency Humans Artificial Intelligence Educational Measurement Clinical Competence Generative Artificial Intelligence

View on PubMed DOI

AIMC Topic: Educational Measurement

Evaluating Large Language Models on American Board of Anesthesiology-style Anesthesiology Questions: Accuracy, Domain Consistency, and Clinical Implications.

LLM-Generated multiple choice practice quizzes for preclinical medical students.

Comparison of a generative large language model to pharmacy student performance on therapeutics examinations.

Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.

Evaluating the Performance of Reasoning Large Language Models on Japanese Radiology Board Examination Questions.

AI in action: Changes to student perceptions when using generative artificial intelligence for the creation of a multimedia project-based assessment.

Artificial intelligence (AI) performance on pharmacy skills laboratory course assignments.

Comparison of ChatGPT-4, Copilot, Bard and Gemini Ultra on an Otolaryngology Question Bank.

Analysis of ChatGPT-4's performance on ophthalmology questions from the MIR exam.

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).

Popular Topics

Recent Journals

AIMC Topic: Educational Measurement

Don't Miss the Future of Medicine

Popular Topics

Recent Journals