Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.

Journal: International journal of medical informatics
PMID:

Abstract

UNLABELLED: Study aims and objectives. This study aims to evaluate the accuracy of medical knowledge in the most advanced LLMs (GPT-4o, GPT-4, Gemini 1.5 Pro, and Claude 3 Opus) as of 2024. It is the first to evaluate these LLMs using a non-English medical licensing exam. The insights from this study will guide educators, policymakers, and technical experts in the effective use of AI in medical education and clinical diagnosis.

Authors

  • Mingxin Liu
    College of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang, China.
  • Tsuyoshi Okuhara
    University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
  • Zhehao Dai
    Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan. Electronic address: daizh@luke.ac.jp.
  • Wenbo Huang
  • Lin Gu
  • Hiroko Okada
    University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
  • Emi Furukawa
    University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
  • Takahiro Kiuchi
    University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.