Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions.

Journal: JMIR medical education
PMID:

Abstract

This study evaluated the performance of ChatGPT with GPT-4 Omni (GPT-4o) on the 118th Japanese Medical Licensing Examination. The study focused on both text-only and image-based questions. The model demonstrated a high level of accuracy overall, with no significant difference in performance between text-only and image-based questions. Common errors included clinical judgment mistakes and prioritization issues, underscoring the need for further improvement in the integration of artificial intelligence into medical education and practice.

Authors

  • Yuki Miyazaki
    Department of Psychiatry, Osaka University Graduate School of Medicine, Osaka, Japan.
  • Masahiro Hata
    Department of Psychiatry, Osaka University Graduate School of Medicine, Osaka, Japan.
  • Hisaki Omori
    Department of Psychiatry, Osaka University Graduate School of Medicine, Suita, Japan.
  • Atsuya Hirashima
    Department of Psychiatry, Osaka University Graduate School of Medicine, Suita, Japan.
  • Yuta Nakagawa
  • Mitsuhiro Eto
    Department of Psychiatry, Osaka University Graduate School of Medicine, Suita, Japan.
  • Shun Takahashi
    Department of Psychiatry, Osaka University Graduate School of Medicine, Suita, Japan.
  • Manabu Ikeda
    Department of Psychiatry, Osaka University Graduate School of Medicine, Osaka, Japan.