Evaluating large language and large reasoning models as decision support tools in emergency internal medicine.

Journal: Computers in biology and medicine
Published Date:

Abstract

BACKGROUND: Large Language Models (LLMs) hold promise for clinical decision support, but their real-world performance varies. We compared three leading models (OpenAI's "o1" Large Reasoning Model (LRM), Anthropic's Claude-3.5-Sonnet, and Meta's Llama-3.2-70B) to human experts in an emergency internal medicine setting.

Authors

  • Josip Vrdoljak
    University of Split School of Medicine, Department for Pathophysiology, Croatia. Electronic address: josip.vrdoljak@mefst.hr.
  • Zvonimir Boban
    University of Split School of Medicine, Department for Medical Physics, Croatia.
  • Ivan Males
    Department of Abdominal Surgery, University Hospital of Split, Spinciceva 1, 21000, Split, Croatia.
  • Roko Skrabic
    University Hospital Split, Department of Nephrology, Split, Croatia.
  • Marko Kumric
    University of Split School of Medicine, Department for Pathophysiology, Croatia.
  • Anna Ottosen
    University of Split, School of Medicine, Split, Croatia.
  • Alexander Clemencau
    University of Split, School of Medicine, Split, Croatia.
  • Josko Bozic
    University of Split School of Medicine, Department for Pathophysiology, Croatia.
  • Sebastian Völker
    IU International University of Applied Sciences, Department of Health, Erfurt, Germany; Paracelsus Medical University, Institute of General Practice, Family Medicine and Preventive Medicine, Salzburg, Austria. Electronic address: sebastian.voelker@iu.org.