Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.
Journal:
BMC medical education
Published Date:
Jun 26, 2024
Abstract
BACKGROUND: Artificial intelligence (AI) chatbots are emerging educational tools for students in healthcare science. However, assessing their accuracy is essential prior to adoption in educational settings. This study aimed to assess the accuracy of predicting the correct answers from three AI chatbots (ChatGPT-4, Microsoft Copilot and Google Gemini) in the Italian entrance standardized examination test of healthcare science degrees (CINECA test). Secondarily, we assessed the narrative coherence of the AI chatbots' responses (i.e., text output) based on three qualitative metrics: the logical rationale behind the chosen answer, the presence of information internal to the question, and presence of information external to the question.