Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions.

Journal: Journal of surgical education
PMID:

Abstract

OBJECTIVE: Recent studies investigated the potential of large language models (LLMs) for clinical decision making and answering exam questions based on text input. Recent developments of LLMs have extended these models with vision capabilities. These image processing LLMs are called vision-language models (VLMs). However, there is limited investigation on the applicability of VLMs and their capabilities of answering exam questions with image content. Therefore, the aim of this study was to examine the performance of publicly accessible LLMs in 2 different surgical question sets consisting of text and image questions.

Authors

  • Jean-Paul Bereuter
    Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany. Electronic address: jean-paul.bereuter@uniklinikum-dresden.de.
  • Mark Enrik Geissler
    Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
  • Anna Klimova
    Institute for Medical Informatics and Biometry, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
  • Robert-Patrick Steiner
    Institute of Pharmacology and Toxicology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
  • Kevin Pfeiffer
    Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
  • Fiona R Kolbinger
    Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana.
  • Isabella C Wiest
    Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
  • Hannah Sophie Muti
    Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
  • Jakob Nikolas Kather
    Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.