AIMC Topic: Educational Measurement

Clear Filters Showing 81 to 90 of 311 articles

Assessing the performance of ChatGPT-4o on the Turkish Orthopedics and Traumatology Board Examination.

Joint diseases and related surgery
OBJECTIVES: This study aims to assess the overall performance of ChatGPT version 4-omni (GPT-4o) on the Turkish Orthopedics and Traumatology Board Examination (TOTBE) using actual examinees as a reference point to evaluate and compare the performance...

The performance of ChatGPT and ERNIE Bot in surgical resident examinations.

International journal of medical informatics
STUDY PURPOSE: To assess the application of these two large language models (LLMs) for surgical resident examinations and to compare the performance of these LLMs with that of human residents.

Using a Hybrid of AI and Template-Based Method in Automatic Item Generation to Create Multiple-Choice Questions in Medical Education: Hybrid AIG.

JMIR formative research
BACKGROUND: Template-based automatic item generation (AIG) is more efficient than traditional item writing but it still heavily relies on expert effort in model development. While nontemplate-based AIG, leveraging artificial intelligence (AI), offers...

Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.

JAMA network open
IMPORTANCE: Large language models (LLMs) are being implemented in health care. Enhanced accuracy and methods to maintain accuracy over time are needed to maximize LLM benefits.

Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5.

Medical teacher
PURPOSE: Students are increasingly relying on artificial intelligence (AI) for medical education and exam preparation. However, the factual accuracy and content distribution of AI-generated exam questions for self-assessment have not been systematica...

Accuracy of LLMs in medical education: evidence from a concordance test with medical teacher.

BMC medical education
BACKGROUND: There is an unprecedented increase in the use of Generative AI in medical education. There is a need to assess these models' accuracy to ensure patient safety. This study assesses the accuracy of ChatGPT, Gemini, and Copilot in answering ...

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.

Neurosurgical review
Large-language models (LLMs) have shown the capability to effectively answer medical board examination questions. However, their ability to answer imagebased questions has not been examined. This study sought to evaluate the performance of two LLMs (...

Performance of Plug-In Augmented ChatGPT and Its Ability to Quantify Uncertainty: Simulation Study on the German Medical Board Examination.

JMIR medical education
BACKGROUND: The GPT-4 is a large language model (LLM) trained and fine-tuned on an extensive dataset. After the public release of its predecessor in November 2022, the use of LLMs has seen a significant spike in interest, and a multitude of potential...

A tutorial activity for students to experience generative artificial intelligence: students' perceptions and actions.

Advances in physiology education
Freely accessible generative artificial intelligence (GenAI) poses challenges to physiology education regarding learning and academic integrity. Although many studies have explored the capabilities of GenAI to complete assessments, few have implement...

Using aggregated AI detector outcomes to eliminate false positives in STEM-student writing.

Advances in physiology education
Generative artificial intelligence (AI) large language models have become sufficiently accessible and user-friendly to assist students with course work, studying tactics, and written communication. AI-generated writing is almost indistinguishable fro...