Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

Journal: PloS one
Published Date:

Abstract

Ankylosing spondylitis (AS), which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. This study aimed to make a comparative evaluation in terms of the readability, information accuracy and quality of the answers given by artificial intelligence (AI)-based chatbots such as ChatGPT, Perplexity and Gemini, which have become popular with the widespread access to medical information, to user questions about AS, a chronic inflammatory joint disease. In this study, the 25 most frequently queried keywords related to AS determined through Google Trends were directed to each 3 AI-based chatbots. The readability of the resulting responses was evaluated using readability indices such as Simple Gunning Fog (GFOG), Flesch Reading Ease Score (FRES) and Measure of Gobbledygook (SMOG). The quality of the responses was measured by Ensuring Quality Information for Patients (EQIP) and Global Quality Score (GQS) scores, and the reliability was measured using the modified DISCERN and Journal of American Medical Association (JAMA) scales. According to Google Trends data, the most frequently searched keywords related to AS are "Ankylosing spondylitis pain", "Ankylosing spondylitis symptoms" and "Ankylosing spondylitis disease", respectively. It was found that the readability levels of the answers produced by AI-based chatbots were above the 6th grade level and showed a statistically significant difference (p < 0.001). In EQIP, JAMA, mDISCERN and GQS evaluations, Perplexity stood out in terms of information quality and reliability, receiving higher scores compared to other chat robots (p < 0.05). It has been found that the answers given by AI chatbots to AS-related questions exceed the recommended readability level and the reliability and quality assessment raises concerns due to some low scores. It is possible for future AI chatbots to have sufficient quality, reliability and appropriate readability levels with an audit mechanism in place.

Authors

  • Mete Kara
    Izmir City Hospital, Internal Medicine, Rheumatology, Izmir, Turkey.
  • Erkan Ozduran
    Physical Medicine and Rehabilitation, Pain Medicine, Sivas Numune Hospital, Sivas, Turkey.
  • Müge Mercan Kara
    Izmir City Hospital, Neurology, Pain Medicine, Izmir, Turkey.
  • İlhan Celil Özbek
    Physical Medicine and Rehabilitation, Health Science University, Derince Education and Research Hospital, Kocaeli, Turkey.
  • Volkan Hancı
    Clinic of Anesthesiology and Critical Care, Sincan Education and Research Hospital, Ankara, Turkey.