Evaluating large language and large reasoning models as decision support tools in emergency internal medicine.
Journal:
Computers in biology and medicine
Published Date:
Jun 1, 2025
Abstract
BACKGROUND: Large Language Models (LLMs) hold promise for clinical decision support, but their real-world performance varies. We compared three leading models (OpenAI's "o1" Large Reasoning Model (LRM), Anthropic's Claude-3.5-Sonnet, and Meta's Llama-3.2-70B) to human experts in an emergency internal medicine setting.