Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis.

Journal: Journal of clinical neuroscience : official journal of the Neurosurgical Society of Australasia
PMID:

Abstract

INTRODUCTION: Artificial intelligence (AI) has gained significant attention in medicine, particularly in neurosurgery, where its potential is often discussed and occasionally feared. Large language models (LLMs), such as ChatGPT-4.0 (OpenAI) and Gemini (formerly known as Bard, Google DeepMind), have shown promise in text-based tasks but remain under explored in image-based domains, which are essential for neurosurgery. This study evaluates the performance of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions, focusing on their ability to interpret visual data, a critical aspect of neurosurgical decision-making.

Authors

  • Alana M McNulty
    Department of Neurosurgery, Albany Medical Center, Albany, NY, USA.
  • Harshitha Valluri
    Department of Neurosurgery, Albany Medical Center, Albany, NY, USA.
  • Avi A Gajjar
    Department of Neurological Surgery, University of Pittsburgh Medical Center, Pittsburgh , Pennsylvania , USA.
  • Amanda Custozzo
    Department of Neurosurgery, Albany Medical Center, Albany, NY, USA.
  • Nicholas C Field
    Department of Neurosurgery, Albany Medical Center, Albany, NY, USA.
  • Alexandra R Paul
    Department of Neurosurgery, Albany Medical Center, Albany, NY, USA. Electronic address: PaulA1@amc.edu.