Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1.
Journal:
Cureus
Published Date:
Mar 23, 2025
Abstract
Introduction Chat generative pretrained transformer (ChatGPT; OpenAI, San Francisco, CA) has developed rapidly and is used in various fields, including medical engineering. Japan's Certificate Examination for Biomedical Engineering class 1 (CEBM1) is responsible for the assessment of comprehensive specialized knowledge and skills centered on the maintenance and safety management of medical devices, systems, and related equipment. This study evaluated the performance of ChatGPT (GPT-4o) on CEBM1 for comparison to human-level expectations. Methods We targeted 171 questions including testing for knowledge with fundamental, applied, and problem-solving abilities from the 26th to 28th CEBM1s. We inputted the Japanese version of questions to ChatGPT (GPT-4o), and evaluated performance based on question difficulty. No prompt optimizations were used. We compared the responses provided by ChatGPT with the correct answers. Results The number of correct answers was 39 (68.4±10.5%) for questions testing fundamental knowledge, 33 (57.9±5.3%) for questions testing applied knowledge, and 38 (59.6±8.0%) for questions testing problem-solving ability. There was no statistically significant difference among the three groups. The passing criteria of 60% or higher was achieved only for the 28th examination. However, over 80% of the questions answered incorrectly were due to a lack of knowledge or incorrect knowledge. When asked questions about the background causes and specific countermeasures for problems related to medical devices, the questions were misunderstood, and in certain cases, answers were generated as hallucinations. Conclusions Currently, ChatGPT possesses a certain level of knowledge in medical engineering; however, it cannot be considered universally accurate in solving all possible problems.
Authors
Keywords
No keywords available for this article.