RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering.

Journal: Radiology. Artificial intelligence

Published Date: Jun 18, 2025

Abstract

Purpose To evaluate diagnostic accuracy of various large language models (LLMs) when answering radiology-specific questions with and without access to additional online, up-to-date information via retrieval-augmented generation (RAG). Materials and Methods The authors developed Radiology RAG (RadioRAG), an end-to-end framework that retrieves data from authoritative radiologic online sources in real-time. RAG incorporates information retrieval from external sources to supplement the initial prompt, grounding the model's response in relevant information. Using 80 questions from the RSNA Case Collection across radiologic subspecialties and 24 additional expert-curated questions with reference standard answers, LLMs (GPT-3.5-turbo, GPT-4, Mistral-7B, Mixtral-8 × 7B, and Llama3 [8B and 70B]) were prompted with and without RadioRAG in a zero-shot inference scenario (temperature ≤ 0.1, top- = 1). RadioRAG retrieved context-specific information from www.radiopaedia.org. Accuracy of LLMs with and without RadioRAG in answering questions from each dataset was assessed. Statistical analyses were performed using bootstrapping while preserving pairing. Additional assessments included comparison of model with human performance and comparison of time required for conventional versus RadioRAG-powered question answering. Results RadioRAG improved accuracy for some LLMs, including GPT-3.5-turbo [74% (59/80) versus 66% (53/80), FDR = 0.03] and Mixtral-8 × 7B [76% (61/80) versus 65% (52/80), FDR = 0.02] on the RSNA-RadioQA dataset, with similar trends in the ExtendedQA dataset. Accuracy exceeded (FDR ≤ 0.007) that of a human expert (63%, (50/80)) for these LLMs, while not for Mistral-7B-instruct-v0.2, Llama3-8B, and Llama3-70B (FDR ≥ 0.21). RadioRAG reduced hallucinations for all LLMs (rates from 6-25%). RadioRAG increased estimated response time fourfold. Conclusion RadioRAG shows potential to improve LLM accuracy and factuality in radiology question answering by integrating real-time domain-specific data. ©RSNA, 2025.

Authors

Soroosh Tayebi Arasteh

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Pauwelsstr. 30, 52074, Aachen, Germany.
Mahshad Lotfinia

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Pauwelsstr 30, 52074 Aachen, Germany.
Keno Bressem

Department of Radiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin and Berlin Institute of Health, 10117 Berlin, Germany.
Robert Siepmann

Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany.
Lisa Adams

Department of Radiology, Charité - Universitätsmedizin Berlin, Hindenburgdamm 30, 12203, Berlin, Germany.
Dyke Ferber

Department of Medical Oncology and Internal Medicine VI, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany.
Christiane Kuhl

Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany (J.S., D.B.A., S.N.); Institute of Computer Vision and Imaging, RWTH University Aachen, Pauwelsstrasse 30, 52072 Aachen, Germany (J.S., D.M.); Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (D.T., M.P., F.M., C.K., S.N.); and Faculty of Mathematics and Natural Sciences, Institute of Informatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany (S.C.).
Jakob Nikolas Kather

Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Sven Nebelung

Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany (J.S., D.B.A., S.N.); Institute of Computer Vision and Imaging, RWTH University Aachen, Pauwelsstrasse 30, 52072 Aachen, Germany (J.S., D.M.); Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (D.T., M.P., F.M., C.K., S.N.); and Faculty of Mathematics and Natural Sciences, Institute of Informatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany (S.C.).
Daniel Truhn

Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, Düsseldorf, Germany (J.S., D.B.A., S.N.); Institute of Computer Vision and Imaging, RWTH University Aachen, Pauwelsstrasse 30, 52072 Aachen, Germany (J.S., D.M.); Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, Germany (D.T., M.P., F.M., C.K., S.N.); and Faculty of Mathematics and Natural Sciences, Institute of Informatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany (S.C.).

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40530957)

RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

RadioRAG: Online Retrieval-augmented Generation for Radiology Question Answering.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals