Performance of ChatGPT on the Brazilian Radiology and Diagnostic Imaging and Mammography Board Examinations.

Journal: Radiology. Artificial intelligence
Published Date:

Abstract

This prospective exploratory study conducted from January 2023 through May 2023 evaluated the ability of ChatGPT to answer questions from Brazilian radiology board examinations, exploring how different prompt strategies can influence performance using GPT-3.5 and GPT-4. Three multiple-choice board examinations that did not include image-based questions were evaluated: radiology and diagnostic imaging, mammography, and neuroradiology. Five different styles of zero-shot prompting were tested: raw question, brief instruction, long instruction, chain-of-thought, and question-specific automatic prompt generation (QAPG). The QAPG and brief instruction prompt strategies performed best for all examinations ( < .05), obtaining passing scores (≥60%) on the radiology and diagnostic imaging examination when testing both versions of ChatGPT. The QAPG style achieved a score of 60% for the mammography examination using GPT-3.5 and 76% using GPT-4. GPT-4 achieved a score up to 65% in the neuroradiology examination. The long instruction style consistently underperformed, implying that excessive detail might harm performance. GPT-4's scores were less sensitive to prompt style changes. The QAPG prompt style showed a high volume of the "A" option but no statistical difference, suggesting bias was found. GPT-4 passed all three radiology board examinations, and GPT-3.5 passed two of three examinations when using an optimal prompt style. ChatGPT, Artificial Intelligence, Board Examinations, Radiology and Diagnostic Imaging, Mammography, Neuroradiology © RSNA, 2023 See also the commentary by Trivedi and Gichoya in this issue.

Authors

  • Leonardo C Almeida
    From the Department of Artificial Intelligence and Management (L.C.A., E.M.J.M.F., N.A., F.C.K.), Graduate Program in Medicine (Clinical Radiology), Universidade Federal de São Paulo (UNIFESP), Rua Botucatu, 740, 04023-062, São Paulo, São Paulo, Brazil; AI Lab (L.C.A., E.M.J.M.F., P.E.A.K., F.C.K.), Dasa, São Paulo, São Paulo, Brazil.
  • Eduardo M J M Farina
    From the Department of Artificial Intelligence and Management (L.C.A., E.M.J.M.F., N.A., F.C.K.), Graduate Program in Medicine (Clinical Radiology), Universidade Federal de São Paulo (UNIFESP), Rua Botucatu, 740, 04023-062, São Paulo, São Paulo, Brazil; AI Lab (L.C.A., E.M.J.M.F., P.E.A.K., F.C.K.), Dasa, São Paulo, São Paulo, Brazil.
  • Paulo E A Kuriki
    DasaInova, Dasa, Av. das Nações Unidas, São Paulo SP, Brazil. Electronic address: paulo.kuriki.ext@dasa.com.br.
  • Nitamar Abdala
    From the Department of Radiology, Stanford University, 300 Pasteur Dr, MC 5105, Stanford, CA 94305 (S.S.H.); Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, Ohio (L.M.P.); Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital/Harvard Medical School, Boston, Mass (J.K.C.); Massachusetts General Hospital & Brigham and Women's Hospital Center for Clinical Data Science, Boston, Mass (A.B.M., K.A.); Department of Radiology, University of Toronto, Toronto, Ontario, Canada (A.B.); Department of Radiology, St. Michael's Hospital, Toronto, Ontario, Canada (M.C.); Department of Diagnostic Imaging, Warren Alpert Medical School of Brown University, Rhode Island Hospital, Providence, RI (I.P.); Universidade Federal de Goiás, Goiânia, Brazil (L.A.P., R.T.S.); Universidade Federal de São Paulo, São Paulo, Brazil (N.A., F.C.K.); Visiana, Hørsholm, Denmark (H.H.T.); MD.ai, New York, NY (L.C.); Department of Radiology, Weill Cornell Medicine, New York, NY (G.S.) Department of Radiology, University of California-San Francisco, San Francisco, Calif (M.D.K.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); and Department of Radiology, Thomas Jefferson University, Philadelphia, Pa (A.E.F.).
  • Felipe C Kitamura