Benchmarking Human-AI collaboration for common evidence appraisal tools.

Journal: Journal of clinical epidemiology

PMID: 39277058

Abstract

BACKGROUND AND OBJECTIVE: It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency.

Authors

Tim Woelfle

Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Translational Imaging in Neurology (ThINk), Department of Biomedical Engineering, University Hospital and University of Basel, Basel, Switzerland. Electronic address: tim.woelfle@usb.ch.
Julian Hirt

Internationale Graduiertenakademie, Institut für Gesundheits- und Pflegewissenschaft, Medizinische Fakultät, Martin-Luther-Universität Halle-Wittenberg, Deutschland; Fachstelle Demenz, Institut für Angewandte Pflegewissenschaft, Fachbereich Gesundheit, FHS St.Gallen, St.Gallen, Schweiz.
Perrine Janiaud

Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland.
Ludwig Kappos

Department of Neurology, University Hospital Basel, Basel, Switzerland.
John P A Ioannidis

Stanford Prevention Research Center, Department of Medicine, Stanford University, Stanford, California.
Lars G Hemkens

Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland; Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; Meta-Research Innovation Center Berlin (METRIC-B), Berlin Institute of Health, Berlin, Germany.

Keywords

Artificial Intelligence Benchmarking Consensus Evidence-Based Medicine Humans Research Design Systematic Reviews as Topic

External Resources

View on PubMed Access via DOI PubMed (39277058)

Benchmarking Human-AI collaboration for common evidence appraisal tools.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals