Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions.

Journal: Journal of surgical education

PMID: 39923296

Abstract

OBJECTIVE: Recent studies investigated the potential of large language models (LLMs) for clinical decision making and answering exam questions based on text input. Recent developments of LLMs have extended these models with vision capabilities. These image processing LLMs are called vision-language models (VLMs). However, there is limited investigation on the applicability of VLMs and their capabilities of answering exam questions with image content. Therefore, the aim of this study was to examine the performance of publicly accessible LLMs in 2 different surgical question sets consisting of text and image questions.

Authors

Jean-Paul Bereuter

Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany. Electronic address: jean-paul.bereuter@uniklinikum-dresden.de.
Mark Enrik Geissler

Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
Anna Klimova

Institute for Medical Informatics and Biometry, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
Robert-Patrick Steiner

Institute of Pharmacology and Toxicology, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
Kevin Pfeiffer

Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
Fiona R Kolbinger

Department of Visceral, Thoracic and Vascular Surgery, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana.
Isabella C Wiest

Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany; Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
Hannah Sophie Muti

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Jakob Nikolas Kather

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.

Keywords

Benchmarking Clinical Competence Educational Measurement General Surgery Germany Humans Language Large Language Models Licensure, Medical United States

External Resources

View on PubMed Access via DOI PubMed (39923296)

Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals