Comparative study of ChatGPT and human evaluators on the assessment of medical literature according to recognised reporting standards.

Journal: BMJ health & care informatics
Published Date:

Abstract

INTRODUCTION: Amid clinicians' challenges in staying updated with medical research, artificial intelligence (AI) tools like the large language model (LLM) ChatGPT could automate appraisal of research quality, saving time and reducing bias. This study compares the proficiency of ChatGPT3 against human evaluation in scoring abstracts to determine its potential as a tool for evidence synthesis.

Authors

  • Richard Hr Roberts
    Reconstructive Surgery and Regenerative Medicine Research Centre, Swansea University, Swansea, UK 838272@swansea.ac.uk.
  • Stephen R Ali
    Reconstructive Surgery and Regenerative Medicine Research Centre, Swansea University, Swansea, UK.
  • Hayley A Hutchings
    Swansea University Medical School, Swansea University, Swansea, UK.
  • Thomas D Dobbs
    Reconstructive Surgery and Regenerative Medicine Research Centre, Swansea University, Swansea, UK.
  • Iain S Whitaker
    Reconstructive Surgery and Regenerative Medicine Research Centre, Swansea University, Swansea, UK.