Medicine on the Edge: Comparative Performance Analysis of On-Device LLMs for Clinical Reasoning
Journal:
arXiv
Published Date:
Feb 13, 2025
Abstract
The deployment of Large Language Models (LLM) on mobile devices offers
significant potential for medical applications, enhancing privacy, security,
and cost-efficiency by eliminating reliance on cloud-based services and keeping
sensitive health data local. However, the performance and accuracy of on-device
LLMs in real-world medical contexts remain underexplored. In this study, we
benchmark publicly available on-device LLMs using the AMEGA dataset, evaluating
accuracy, computational efficiency, and thermal limitation across various
mobile devices. Our results indicate that compact general-purpose models like
Phi-3 Mini achieve a strong balance between speed and accuracy, while medically
fine-tuned models such as Med42 and Aloe attain the highest accuracy. Notably,
deploying LLMs on older devices remains feasible, with memory constraints
posing a greater challenge than raw processing power. Our study underscores the
potential of on-device LLMs for healthcare while emphasizing the need for more
efficient inference and models tailored to real-world clinical reasoning.