A Comparative Analysis of the Accuracy and Readability of Popular Artificial Intelligence-Chat Bots for Inguinal Hernia Management.

Journal: The American surgeon
Published Date:

Abstract

BackgroundArtificial intelligence (AI), particularly large language models (LLMs), has gained attention for its clinical applications. While LLMs have shown utility in various medical fields, their performance in inguinal hernia repair (IHR) remains understudied. This study seeks to evaluate the accuracy and readability of LLM-generated responses to IHR-related questions, as well as their performance across distinct clinical categories.MethodsThirty questions were developed based on clinical guidelines for IHR and categorized into four subgroups: diagnosis, perioperative care, surgical management, and other. Questions were entered into Microsoft Copilot®, Google Gemini®, and OpenAI ChatGPT-4®. Responses were anonymized and evaluated by six fellowship-trained, minimally invasive surgeons using a validated 5-point Likert scale. Readability was assessed with six validated formulae.ResultsGPT-4 and Gemini outperformed Copilot in overall mean scores for response accuracy (Copilot: 3.75 ± 0.99, Gemini: 4.35 ± 0.82, and GPT-4: 4.30 ± 0.89; < 0.001). Subgroup analysis revealed significantly higher scores for Gemini and GPT-4 in perioperative care ( = 0.025) and surgical management ( < 0.001). Readability scores were comparable across models, with all responses at college to college-graduate reading levels.DiscussionThis study highlights the variability in LLM performance, with GPT-4 and Gemini producing higher-quality responses than Copilot for IHR-related questions. However, the consistently high reading level of responses may limit accessibility for patients. These findings underscore the potential of LLMs to serve as valuable adjunct tools in surgical practice, with ongoing advancements expected to further enhance their accuracy, readability, and applicability.

Authors

  • Thisun Udagedara
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Ashley Tran
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Sumaya Bokhari
    College of Natural and Agricultural Sciences, University of California, Riverside, CA, USA.
  • Sharon Shiraga
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Stuart Abel
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Caitlin Houghton
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Katie Galvin
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Kamran Samakar
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.
  • Luke R Putnam
    Division of Upper GI and General Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA.

Keywords

No keywords available for this article.