Benchmark evaluation of DeepSeek large language models in clinical decision-making.

Journal: Nature medicine
Published Date:

Abstract

Large language models (LLMs) are increasingly transforming medical applications. However, proprietary models such as GPT-4o face significant barriers to clinical adoption because they cannot be deployed on site within healthcare institutions, making them noncompliant with stringent privacy regulations. Recent advancements in open-source LLMs such as DeepSeek models offer a promising alternative because they allow efficient fine-tuning on local data in hospitals with advanced information technology infrastructure. Here, to demonstrate the clinical utility of DeepSeek-V3 and DeepSeek-R1, we benchmarked their performance on clinical decision support tasks against proprietary LLMs, including GPT-4o and Gemini-2.0 Flash Thinking Experimental. Using 125 patient cases with sufficient statistical power, covering a broad range of frequent and rare diseases, we found that DeepSeek models perform equally well and in some cases better than proprietary LLMs. Our study demonstrates that open-source LLMs can provide a scalable pathway for secure model training enabling real-world medical applications in accordance with data privacy and healthcare regulations.

Authors

  • Sarah Sandmann
    Institute of Medical Informatics, University of Münster, Münster, Germany.
  • Stefan Hegselmann
    Center for Digital Health, Berlin Institute of Health, Charité - University Medicine Berlin, Berlin, Germany.
  • Michael Fujarski
    Institute of Medical Informatics, University of Münster, Münster, Germany.
  • Lucas Bickmann
    Institute of Medical Informatics, University of Münster, Münster, Germany.
  • Benjamin Wild
    Center for Digital Health, Berlin Institute of Health, Charité - University Medicine Berlin, Berlin, Germany.
  • Roland Eils
    Center for Digital Health, Berlin Institute of Health, Charité - University Medicine Berlin, Berlin, Germany. roland_eils@fudan.edu.cn.
  • Julian Varghese
    Institute of Medical Data Science, Otto-von-Guericke University, Magdeburg, Germany.

Keywords

No keywords available for this article.