DeepSeek-R1 and GPT-4 are comparable in a complex diagnostic challenge: a historical control study.

Journal: International journal of surgery (London, England)
Published Date:

Abstract

BACKGROUND: Large language models (LLMs) have demonstrated potential in medical diagnostics, but their accuracy in complex cases remains a subject of investigation. DeepSeek-R1, an open-source model with advanced reasoning capabilities, has gained global attention. This study evaluates the diagnostic performance of DeepSeek-R1 compared to GPT-4 in complex clinical cases.

Authors

  • Lining Chan
    Department of Plastic Surgery, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.
  • Xinjie Xu
  • Kaiyang Lv