Large Language Model-Assisted Systematic Review: Validation Based on Cochrane Review Data.

Journal: Studies in health technology and informatics

Published Date: May 15, 2025

Abstract

Large Language Models (LLMs) offer potential for automating systematic reviews, a labor-intensive process in evidence-based medicine. We evaluated GPT-4o, GPT-4o-mini, and Llama 3.1:8B on abstract screening and risk of bias assessment using 12 Cochrane drug intervention reviews. GPT-4o achieved the best screening performance (recall 0.894, precision 0.492). We propose a one-shot inclusivity adjustment method enabling threshold modulation without repeated inferences. For risk of bias, accuracy varied by domain, highest in random sequence generation (0.873), and lowest in selective reporting (0.418). Our findings demonstrate LLMs' practical utility and current limitations in automating systematic reviews.

Authors

Siun Kim

Department of Applied Biomedical Engineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea; Center for Convergence Approaches in Drug Development, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Korea.
Hyung-Jin Yoon

Department of Human Systems Medicine, Seoul National University College of Medicine, Seoul, Korea.

Keywords

Evidence-Based Medicine Humans Large Language Models Natural Language Processing Programming Languages Reproducibility of Results Review Literature as Topic Systematic Reviews as Topic

External Resources

View on PubMed Access via DOI PubMed (40380609)

Large Language Model-Assisted Systematic Review: Validation Based on Cochrane Review Data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals