How does artificial intelligence master urological board examinations? A comparative analysis of different Large Language Models' accuracy and reliability in the 2022 In-Service Assessment of the European Board of Urology.

Journal: World journal of urology
Published Date:

Abstract

PURPOSE: This study is a comparative analysis of three Large Language Models (LLMs) evaluating their rate of correct answers (RoCA) and the reliability of generated answers on a set of urological knowledge-based questions spanning different levels of complexity.

Authors

  • Lisa Kollitsch
    Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria.
  • Klaus Eredics
    Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria.
  • Martin Marszalek
    Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria.
  • Michael Rauchenwald
    Department of Urology and Andrology, Klinik Donaustadt, Vienna, Austria.
  • Sabine D Brookman-May
    Department of Urology, University of Munich, LMU, Munich, Germany.
  • Maximilian Burger
    Department of Urology, Caritas St. Josef Medical Centre, University of Regensburg, Regensburg, Germany.
  • Katharina Körner-Riffard
    Department of Urology, Caritas St. Josef Medical Centre, University of Regensburg, Regensburg, Germany.
  • Matthias May
    Department of Urology, St. Elisabeth Hospital Straubing, Brothers of Mercy Hospital, Straubing, Germany. Electronic address: matthias.may@klinikum-straubing.de.