Dr. Chatbot: Investigating the Quality and Quantity of Responses Generated by Three AI Chatbots to Prompts Regarding Carpal Tunnel Syndrome.
Journal:
Cureus
Published Date:
Mar 24, 2025
Abstract
Introduction The objective of this study is to investigate the amount and accuracy of statements provided in answers by AI chatbots to prompts about carpal tunnel syndrome. To the authors' knowledge, this is the first study to assess the answers provided by OpenAI™ ChatGPT-4o model, AMBOSS™ GPT, and Google™ Gemini to common patient-based questions regarding carpal tunnel, using UpToDate as a standard reference. Objective To determine which chatbot produces the most medically accurate responses. The authors hypothesize that the paid upgrade to Chat-GPT-4o (AMBOSS GPT) will have the most accurate responses compared to the two free chatbots, ChatGPT-4o and Google Gemini 1.5 Flash model. Main outcome measures The number of statements generated by each chatbot and the percentage of those statements that can be directly verified using exact quotations from supporting information available on UpToDate as of December 2024. Results There was a significant difference in terms of the number of average statements provided per prompt by the three chatbots, as GPT-4o produced 8.9 more statements compared to AMBOSS GPT (p = 0.0081916), GPT-4o produced 19.65 more statements compared to Gemini (p = 0.0000001), and AMBOSS GPT produced 10.75 more statements than Gemini (p = <0.0000001). There was also a significant difference in terms of the percentage of information provided by each chatbot that was able to be verified in AMBOSS GPT (85.97%) vs. GPT-4o (71.76%) and Gemini (73.53%), with differences of 14.22% (p = 0.0000002) and 12.44% (p = 0.0003969), respectively. Conclusions This study demonstrated that when looking at the three AI chatbots, AMBOSS GPT, GPT-4o, and Google Gemini, GPT-4o produced the most information per prompt; however, AMBOSS GPT provided a larger percentage of information that was able to be found supported within information available in UpToDate
Authors
Keywords
No keywords available for this article.