Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis.

Journal: Journal of medical Internet research
Published Date:

Abstract

BACKGROUND: Large language models (LLMs) have raised both interest and concern in the academic community. They offer the potential for automating literature search and synthesis for systematic reviews but raise concerns regarding their reliability, as the tendency to generate unsupported (hallucinated) content persist.

Authors

  • Mikaël Chelli
    University Institute of Locomotion and Sport, University Hospital of Nice, Nice, France. Electronic address: mikael.chelli@gmail.com.
  • Jules Descamps
    Orthopedic and Traumatology Unit, Hospital Lariboisière, Assistance Publique-Hôpitaux de Paris, Paris, France.
  • Vincent Lavoué
    Department of Obstetrics, Gynecology and Human Reproduction, University of Rennes, Rennes, France.
  • Christophe Trojani
    Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France.
  • Michel Azar
    Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France.
  • Marcel Deckert
    Université Côte d'Azur, INSERM, C3M, Team Microenvironment, Signalling and Cancer, Nice, France.
  • Jean-Luc Raynier
    Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France.
  • Gilles Clowez
    Institute for Sports and Reconstructive Bone and Joint Surgery, Groupe Kantys, Nice, France.
  • Pascal Boileau
    University Institute of Locomotion and Sport, University Hospital of Nice, Nice, France.
  • Caroline Ruetsch-Chelli
    Université Côte d'Azur, INSERM, C3M, Team Microenvironment, Signalling and Cancer, Nice, France.