Bridging AI and Medical Expertise: ChatGPT's Success on the Medical Specialization Residency Admission Exam in Spain.

Journal: Studies in health technology and informatics
Published Date:

Abstract

The growing use of Artificial Intelligence (AI) in healthcare, particularly focusing on the potential of generative AI models like ChatGPT-4 is a trending topic. The study examines how ChatGPT-4 performed on the national Medicine Residency exam in Spain, a highly selective test for accessing the medical specialization training program called MIR. ChatGPT-4 answered 210 questions, including 25 that required image interpretation. The chatbot correctly answered 150 out of 200 questions, achieving an estimated ranking of around 1900-2300 out of 11,577 candidates. This performance would allow access to most medical specialties in Spain. No significant differences were found between questions requiring image analysis and those that did not, but ChatGPT struggled with more difficult questions, showing a higher error rate for complex problems just like a human being. Despite its potential as an educational and problem-solving tool, the study highlights ChatGPT's limitations, including occasional "AI hallucinations" (incorrect or nonsensical answers) and variability in responses when questions were repeated. The study emphasizes that while AI tools such as ChatGPT can assist in education and medical tasks, they cannot replace qualified healthcare professionals, and their output requires careful verification.

Authors

  • Angela Leis
    Hospital del Mar Research Institute, Barcelona, Spain.
  • Miguel-Angel Mayer
    Hospital del Mar, Barcelona, Spain.
  • Alex Mayer
    Hospital Parc TaulĂ­, Sabadell, Spain.