Benchmarking Human-AI collaboration for common evidence appraisal tools.

Journal: Journal of clinical epidemiology
PMID:

Abstract

BACKGROUND AND OBJECTIVE: It is unknown whether large language models (LLMs) may facilitate time- and resource-intensive text-related processes in evidence appraisal. The objective was to quantify the agreement of LLMs with human consensus in appraisal of scientific reporting (Preferred Reporting Items for Systematic reviews and Meta-Analyses [PRISMA]) and methodological rigor (A MeaSurement Tool to Assess systematic Reviews [AMSTAR]) of systematic reviews and design of clinical trials (PRagmatic Explanatory Continuum Indicator Summary 2 [PRECIS-2]) and to identify areas where collaboration between humans and artificial intelligence (AI) would outperform the traditional consensus process of human raters in efficiency.

Authors

  • Tim Woelfle
    Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Neurology, University Hospital Basel, Basel, Switzerland; Translational Imaging in Neurology (ThINk), Department of Biomedical Engineering, University Hospital and University of Basel, Basel, Switzerland. Electronic address: tim.woelfle@usb.ch.
  • Julian Hirt
    Internationale Graduiertenakademie, Institut für Gesundheits- und Pflegewissenschaft, Medizinische Fakultät, Martin-Luther-Universität Halle-Wittenberg, Deutschland; Fachstelle Demenz, Institut für Angewandte Pflegewissenschaft, Fachbereich Gesundheit, FHS St.Gallen, St.Gallen, Schweiz.
  • Perrine Janiaud
    Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland.
  • Ludwig Kappos
    Department of Neurology, University Hospital Basel, Basel, Switzerland.
  • John P A Ioannidis
    Stanford Prevention Research Center, Department of Medicine, Stanford University, Stanford, California.
  • Lars G Hemkens
    Pragmatic Evidence Lab, Research Center for Clinical Neuroimmunology and Neuroscience Basel (RC2NB), Basel, Switzerland; Department of Clinical Research, University Hospital Basel and University of Basel, Basel, Switzerland; Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA; Meta-Research Innovation Center Berlin (METRIC-B), Berlin Institute of Health, Berlin, Germany.