The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries.

Journal: Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research
PMID:

Abstract

Artificial intelligence (AI) chatbots utilizing large language models (LLMs) have recently garnered significant interest due to their ability to generate humanlike responses to user inquiries in an interactive dialog format. While these models are being increasingly utilized to obtain medical information by patients, scientific and medical providers, and trainees to address biomedical questions, their performance may vary from field to field. The opportunities and risks these chatbots pose to the widespread understanding of skeletal health and science are unknown. Here we assess the performance of 3 high-profile LLM chatbots, Chat Generative Pre-Trained Transformer (ChatGPT) 4.0, BingAI, and Bard, to address 30 questions in 3 categories: basic and translational skeletal biology, clinical practitioner management of skeletal disorders, and patient queries to assess the accuracy and quality of the responses. Thirty questions in each of these categories were posed, and responses were independently graded for their degree of accuracy by four reviewers. While each of the chatbots was often able to provide relevant information about skeletal disorders, the quality and relevance of these responses varied widely, and ChatGPT 4.0 had the highest overall median score in each of the categories. Each of these chatbots displayed distinct limitations that included inconsistent, incomplete, or irrelevant responses, inappropriate utilization of lay sources in a professional context, a failure to take patient demographics or clinical context into account when providing recommendations, and an inability to consistently identify areas of uncertainty in the relevant literature. Careful consideration of both the opportunities and risks of current AI chatbots is needed to formulate guidelines for best practices for their use as source of information about skeletal health and biology.

Authors

  • Michelle Cung
    Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY 10065, United States.
  • Branden Sosa
    Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY 10065, United States.
  • He S Yang
    Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.
  • Michelle M McDonald
    Skeletal Diseases Program, The Garvan Institute of Medical Research, Darlinghurst, 2010, Australia.
  • Brya G Matthews
    Department of Molecular Medicine and Pathology, University of Auckland, Auckland, 1142, New Zealand.
  • Annegreet G Vlug
    Center for Bone Quality, Department of Internal Medicine, Leiden University Medical Center, Leiden, 2300, The Netherlands.
  • Erik A Imel
    Indiana Center for Musculoskeletal Health, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, United States.
  • Marc N Wein
    Endocrine Unit, Massachusetts General Hospital, Boston, MA 02114, United States.
  • Emily Margaret Stein
    Division of Endocrinology, Hospital for Special Surgery, New York, NY 10021, United States.
  • Matthew B Greenblatt
    Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY 10065, United States.