Can American Board of Surgery in Training Examinations be passed by Large Language Models? Comparative assessment of Gemini, Copilot, and ChatGPT.

Journal: The American surgeon
Published Date:

Abstract

ObjectiveThis study aimed to evaluate the performance of large language models (LLMs) in answering questions from the American Board of Surgery In-Training Examination (ABSITE).MethodsMultiple choice ABSITE Quiz was entered into the most popular LLMs as prompts. ChatGPT-4 (OpenAI), Copilot (Microsoft), and Gemini (Google) were used in the study. The research comprised 170 questions from 2017 to 2022, which were divided into four subgroups: Definitions, Biochemistry/Pharmaceutical, Case Scenario, and Treatment & Surgical Procedures. All questions were queried in LLMs, between October 1, 2024, and October 5, 2024. Correct answer rates of LLMs were evaluated.ResultsThe correct response rates for all questions were 79.4% for ChatGPT, 77.6% for Copilot, and 52.9% for Gemini, with Gemini significantly lower than both LLMs ( < 0.001). In the definition category, the correct response rates were 93.5% for ChatGPT, 90.3% for Copilot, and 64.5% for Gemini, with Gemini significantly lower ( = 0.005 and = 0.015, respectively). In the Biochemistry/Pharmaceutical question category, the correct response rates were equal in all three groups (83.3%). In the Case Scenario category, the correct response rates were 76.3% in ChatGPT, 72.8% for Copilot, and 46.5% for Gemini, with Gemini significantly lower ( < 0.001). In the Treatment & Surgical Procedures category, the correct response rates were 69.2% for ChatGPT, 84.6% for Copilot, and 53.8% for Gemini. Although Gemini had the lowest accuracy, there was no statistically significant difference ( = 0.236).ConclusionIn the ABSITE Quiz, ChatGPT and Copilot had similar success, whereas Gemini was significantly behind.

Authors

  • Ahmet Necati Sanli
    Department of General Surgery, Abdulkadir Yuksel State Hospital, Gaziantep, Turkey.
  • Deniz Esin Tekcan Sanli
    Department of Radiology, School of Medicine, Gazianep University, Gaziantep, Turkey.
  • Ali Karabulut
    Department of General Surgery, Bagcilar Training and Research, University of Health Sciences, Istanbul, Turkey.

Keywords

No keywords available for this article.