Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.

Journal: NPJ digital medicine

Published Date: Apr 5, 2025

Abstract

Large Language Models (LLMs) hold promise for medical applications but often lack domain-specific expertise. Retrieval Augmented Generation (RAG) enables customization by integrating specialized knowledge. This study assessed the accuracy, consistency, and safety of LLM-RAG models in determining surgical fitness and delivering preoperative instructions using 35 local and 23 international guidelines. Ten LLMs (e.g., GPT3.5, GPT4, GPT4o, Gemini, Llama2, and Llama3, Claude) were tested across 14 clinical scenarios. A total of 3234 responses were generated and compared to 448 human-generated answers. The GPT4 LLM-RAG model with international guidelines generated answers within 20 s and achieved the highest accuracy, which was significantly better than human-generated responses (96.4% vs. 86.6%, p = 0.016). Additionally, the model exhibited an absence of hallucinations and produced more consistent output than humans. This study underscores the potential of GPT-4-based LLM-RAG models to deliver highly accurate, efficient, and consistent preoperative assessments.

Authors

Yu He Ke

Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore.
Liyuan Jin

Duke-NUS Medical School, Singapore, Singapore.
Kabilan Elangovan

Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research Institute, Singapore.
Hairil Rizal Abdullah

Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore.
Nan Liu

Duke-NUS Medical School Centre for Quantitative Medicine Singapore Singapore.
Alex Tiong Heng Sia

Duke-NUS Medical School, Singapore, Singapore.
Chai Rick Soh

Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore.
Joshua Yi Min Tung

Data Science and Artificial Intelligence Lab, Singapore General Hospital, Singapore, Singapore.
Jasmine Chiat Ling Ong

Duke-NUS Medical School, Singapore, Singapore.
Chang-Fu Kuo

Department of Rheumatology, Allergy, and Immunology, Chang Gung Memorial Hospital, Taipei, Taiwan, ROC.
Shao-Chun Wu

Department of Anesthesiology, Kaohsiung Chang Gung Memorial Hospital, College of Medicine, Chang Gung University, Kaohsiung, Taiwan, ROC.
Vesela P Kovacheva

Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
Daniel Shu Wei Ting

Singapore National Eye Center, Singapore Eye Research Institute Singapore Health Service Singapore Singapore.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40185842)

Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals