Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.

Journal: Scientific reports
PMID:

Abstract

This study aims to compare and evaluate the performance of GPT-3.5, GPT-4, and GPT-4o in the 2020 and 2021 Chinese National Medical Licensing Examination (NMLE), exploring their potential value in medical education and clinical applications. Six hundred original test questions from the 2020 and 2021 NMLE (covering five types of questions) were selected and input into GPT-3.5, GPT-4, and GPT-4o for response. The accuracy of the models across different question types and units was recorded and analyzed. Statistical methods were employed to compare the performance differences among the three models. GPT-4o demonstrated significantly higher overall accuracy than GPT-4 and GPT-3.5 (P < 0.001). In the 2020 and 2021 exams, GPT-4o achieved accuracy rates of 84.2% and 88.2%, respectively, with the highest accuracy observed in questions related to the digestive system (Unit 3), reaching 94.75%. GPT-4 showed moderate performance, while GPT - 3.5 had the lowest accuracy. Additionally, GPT-4o exhibited a clear advantage in complex question formats, such as case analysis questions (A3/A4 type) and standard matching questions (B1 type). GPT-4o outperformed its predecessors in the NMLE, demonstrating exceptional comprehension and problem-solving abilities in non-English medical examinations. This study provides important insights into the application and promotion of generative AI in medical education and clinical practice.

Authors

  • Dingyuan Luo
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China.
  • Mengke Liu
    Department of Radiology, Affiliated Shandong Provincial Hospital, Shandong First Medical University, Jinan, 250021, Shandong, China.
  • Runyuan Yu
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China.
  • Yulian Liu
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China.
  • Wenjun Jiang
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China.
  • Qi Fan
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China.
  • Naifeng Kuang
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China.
  • Qiang Gao
    Faculty of Material Science and Chemistry, China University of Geosciences, Wuhan 430074, PR China.
  • Tao Yin
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China. yintaokfk@163.com.
  • Zuncheng Zheng
    Department of Rehabilitation Medicine Center, Affiliated Tai'an Central Hospital, Qingdao University, No. 29, Longtan Road, Taishan District, Tai'an City, 271000, Shandong, China. zxyyzhengzuncheng@126.com.