Performance of Large Language Models in the Japanese Public Health Nurse National Examination: Comparative Cross-Sectional Study.

Journal: JMIR nursing

Published Date: Feb 20, 2026

Abstract

BACKGROUND: Large language models (LLMs) have shown promising results on Japanese national medical and nursing examinations. However, no study has evaluated LLM performance on the Japanese Public Health Nurse National Examination, which requires specialized knowledge in community health and public health nursing practice. OBJECTIVE: This study aimed to compare the performance of multiple LLMs on the Japanese Public Health Nurse National Examination and evaluate their potential utility in public health nursing education. METHODS: Three LLMs were evaluated: GPT-4o, Claude Opus 4, and Gemini 2.5 Pro. All 110 questions from the 111th Public Health Nurse National Examination were administered using standardized prompts. Questions were classified by format (text vs figure or calculation), content (general vs situational), and selection type (single vs multiple choice). Accuracy rates and 95% CIs were calculated, with statistical comparisons performed using chi-square tests. RESULTS: All LLMs exceeded the passing criterion (60%). The accuracy rates were as follows: 85.5% (94/110) for GPT-4o (95% CI 77.5%-91.5%), 91.8% (101/110) for Claude Opus 4 (95% CI 85.0%-96.2%), and 92.7% (102/110) for Gemini 2.5 Pro (95% CI 86.2%-96.8%). No significant differences were found among the LLMs (P>.99). However, all models showed lower accuracy on multiple-choice questions than on single-choice questions, with significant intramodel differences observed for GPT-4o (10/16, 62.5% vs 82/92, 89.1%; P=.01) and Claude Opus 4 (12/16, 75% vs 87/92, 94.6%; P=.03). CONCLUSIONS: LLMs demonstrated high performance on a public health nursing examination but showed limitations in complex reasoning requiring multiple-choice selection. These findings suggest the potential for LLM use as educational support tools while highlighting the need for cautious implementation in specialized nursing education.

Performance of Large Language Models in the Japanese Public Health Nurse National Examination: Comparative Cross-Sectional Study.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Performance of Large Language Models in the Japanese Public Health Nurse National Examination: Comparative Cross-Sectional Study.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals