Comparison of the Performance of GPT-3.5 and GPT-4 With That of Medical Students on the Written German Medical Licensing Examination: Observational Study.

Journal: JMIR medical education
Published Date:

Abstract

BACKGROUND: The potential of artificial intelligence (AI)-based large language models, such as ChatGPT, has gained significant attention in the medical field. This enthusiasm is driven not only by recent breakthroughs and improved accessibility, but also by the prospect of democratizing medical knowledge and promoting equitable health care. However, the performance of ChatGPT is substantially influenced by the input language, and given the growing public trust in this AI tool compared to that in traditional sources of information, investigating its medical accuracy across different languages is of particular importance.

Authors

  • Annika Meyer
    Institute for Clinical Chemistry, University Hospital Cologne, Cologne, Germany.
  • Janik Riese
    Department of General Surgery, Visceral, Thoracic and Vascular Surgery, University Hospital Greifswald, Greifswald, Germany.
  • Thomas Streichert
    Institute for Clinical Chemistry, University Hospital Cologne, Cologne, Germany.