Using a large language model (ChatGPT) to assess risk of bias in randomized controlled trials of medical interventions: protocol for a pilot study of interrater agreement with human reviewers.

Journal: BMC medical research methodology

Published Date: Jul 31, 2025

Abstract

BACKGROUND: Risk of bias (RoB) assessment is an essential part of systematic reviews that requires reading and understanding each eligible trial and RoB tools. RoB assessment is subject to human error and is time-consuming. Machine learning-based tools have been developed to automate RoB assessment using simple models trained on limited corpuses. ChatGPT is a conversational agent based on a large language model (LLM) that was trained on an internet-scale corpus and has demonstrated human-like abilities in multiple areas including healthcare. LLMs might be able to support systematic reviewing tasks such as assessing RoB. We aim to assess interrater agreement in overall (rather than domain-level) RoB assessment between human reviewers and ChatGPT, in randomized controlled trials of interventions within medical interventions.

Authors

Christopher James Rose

Norwegian Institute of Public Health, Skøyen, Norway.
Julia Bidonde

Division of Health Services, Norwegian Institute of Public Health, Oslo, Norway.
Martin Ringsten

Cochrane Sweden, Lund University, Skåne University Hospital, Lund, Sweden.
Julie Glanville

Glanville.info, York, UK.
Rigmor C Berg

Norwegian Institute of Public Health, Oslo, Norway.
Chris Cooper

Bristol Medical School, University of Bristol, Bristol, UK.
Ashley Elizabeth Muller

Norwegian Institute of Public Health, Skøyen, Norway.
Hans Bugge Bergsund

Cluster for Reviews and Health Technology Assessments, Norwegian Institute of Public Health, Oslo, Norway.
Jose F Meneses-Echavez

Cluster for Reviews and Health Technology Assessments, Norwegian Institute of Public Health, Oslo, Norway.
Thomas Potrebny

Section for Evidence-Based Practice, Western Norway University of Applied Sciences, Bergen, Norway.

Keywords

Bias Generative Artificial Intelligence Humans Language Large Language Models Machine Learning Observer Variation Pilot Projects Randomized Controlled Trials as Topic Research Design Risk Assessment Systematic Reviews as Topic

External Resources

View on PubMed Access via DOI PubMed (40745627)

Using a large language model (ChatGPT) to assess risk of bias in randomized controlled trials of medical interventions: protocol for a pilot study of interrater agreement with human reviewers.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Using a large language model (ChatGPT) to assess risk of bias in randomized controlled trials of medical interventions: protocol for a pilot study of interrater agreement with human reviewers.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals