LLMs in Disease Diagnosis: A Comparative Study of DeepSeek-R1 and O3 Mini Across Chronic Health Conditions
Journal:
arXiv
Published Date:
Mar 13, 2025
Abstract
Large Language Models (LLMs) are revolutionizing medical diagnostics by
enhancing both disease classification and clinical decision-making. In this
study, we evaluate the performance of two LLM- based diagnostic tools, DeepSeek
R1 and O3 Mini, using a structured dataset of symptoms and diagnoses. We
assessed their predictive accuracy at both the disease and category levels, as
well as the reliability of their confidence scores. DeepSeek R1 achieved a
disease-level accuracy of 76% and an overall accuracy of 82%, outperforming O3
Mini, which attained 72% and 75% respectively. Notably, DeepSeek R1
demonstrated exceptional performance in Mental Health, Neurological Disorders,
and Oncology, where it reached 100% accuracy, while O3 Mini excelled in
Autoimmune Disease classification with 100% accuracy. Both models, however,
struggled with Respiratory Disease classification, recording accuracies of only
40% for DeepSeek R1 and 20% for O3 Mini. Additionally, the analysis of
confidence scores revealed that DeepSeek R1 provided high-confidence
predictions in 92% of cases, compared to 68% for O3 Mini. Ethical
considerations regarding bias, model interpretability, and data privacy are
also discussed to ensure the responsible integration of LLMs into clinical
practice. Overall, our findings offer valuable insights into the strengths and
limitations of LLM-based diagnostic systems and provide a roadmap for future
enhancements in AI-driven healthcare.