Can AI chatbots guide patients and physicians about neck pain? A reliability and readability comparison of ChatGPT-4 and Gemini.
Journal:
Journal of back and musculoskeletal rehabilitation
Published Date:
Mar 17, 2026
Abstract
BackgroundArtificial intelligence (AI)-based chatbots are increasingly used as sources of medical information. Given the high prevalence of neck pain as a musculoskeletal symptom, patients may commonly consult such tools for health-related guidance.ObjectiveTo evaluate and compare the performance of ChatGPT 4.0 and Google Gemini in addressing commonly asked patient questions and clinical case scenarios related to neck pain, focusing on their accuracy, quality, understandability, readability, reliability, and usability.MethodsTwenty-four patient-oriented questions and four clinical case scenarios regarding neck pain were submitted to ChatGPT 4.0 and Google Gemini. Responses were evaluated using validated tools: modified DISCERN (mDISCERN) for reliability, Global Quality Scale (GQS) for quality, PEMAT-P for understandability and actionability, and Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) for readability. Case-based responses were assessed for accuracy, safety, and usability on a 7-point Likert scale by two experienced physicians.ResultsGemini demonstrated significantly higher reliability (mDISCERN, pā<ā0.001), whereas ChatGPT 4.0 had slightly higher, though statistically insignificant, GQS and PEMAT-P scores. Readability metrics were similar: ChatGPT's FRE was 48.78 and FKGL 9.08; Gemini's FRE was 47.12 and FKGL 9.11. Both models' outputs were considered difficult to read. In clinical scenarios, both chatbots showed comparable accuracy, safety, and usability, with minor omissions noted.ConclusionChatGPT 4.0 and Google Gemini provided similar performance in addressing neck pain-related queries. While both may support patient.
Authors
Keywords
No keywords available for this article.