Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.

Journal: International journal of medical informatics

PMID: 39471700

Abstract

UNLABELLED: Study aims and objectives. This study aims to evaluate the accuracy of medical knowledge in the most advanced LLMs (GPT-4o, GPT-4, Gemini 1.5 Pro, and Claude 3 Opus) as of 2024. It is the first to evaluate these LLMs using a non-English medical licensing exam. The insights from this study will guide educators, policymakers, and technical experts in the effective use of AI in medical education and clinical diagnosis.

Authors

Mingxin Liu

College of Electronics and Information Engineering, Guangdong Ocean University, Zhanjiang, China.
Tsuyoshi Okuhara

University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
Zhehao Dai

Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan. Electronic address: daizh@luke.ac.jp.
Wenbo Huang
Lin Gu
Hiroko Okada

University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
Emi Furukawa

University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.
Takahiro Kiuchi

University Hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8655, Japan.

Keywords

Artificial Intelligence Clinical Competence Education, Medical Educational Measurement Japan Licensure, Medical

External Resources

View on PubMed Access via DOI PubMed (39471700)

Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals