Performance of Large Language Models in Supporting Medical Diagnosis and Treatment
Journal:
arXiv
Published Date:
Apr 14, 2025
Abstract
The integration of Large Language Models (LLMs) into healthcare holds
significant potential to enhance diagnostic accuracy and support medical
treatment planning. These AI-driven systems can analyze vast datasets,
assisting clinicians in identifying diseases, recommending treatments, and
predicting patient outcomes. This study evaluates the performance of a range of
contemporary LLMs, including both open-source and closed-source models, on the
2024 Portuguese National Exam for medical specialty access (PNA), a
standardized medical knowledge assessment. Our results highlight considerable
variation in accuracy and cost-effectiveness, with several models demonstrating
performance exceeding human benchmarks for medical students on this specific
task. We identify leading models based on a combined score of accuracy and
cost, discuss the implications of reasoning methodologies like
Chain-of-Thought, and underscore the potential for LLMs to function as valuable
complementary tools aiding medical professionals in complex clinical
decision-making.