Management of Dupuytren's Disease: A Multi-Centric Comparative Analysis Between Experienced Hand Surgeons Versus Artificial Intelligence.

Journal: Diagnostics (Basel, Switzerland)
Published Date:

Abstract

: Dupuytren's fibroproliferative disease affecting the hand's palmar fascia leads to progressive finger contractures and functional limitations. Management of this condition relies heavily on the expertise of hand surgeons, who tailor interventions based on clinical assessment. With the growing interest in artificial intelligence (AI) in medical decision-making, this study aims to evaluate the feasibility of integrating AI into the clinical management of Dupuytren's disease by comparing AI-generated recommendations with those of expert hand surgeons. : This multicentric comparative study involved three experienced hand surgeons and five AI systems (ChatGPT, Gemini, Perplexity, DeepSeek, and Copilot). Twenty-two standardized clinical prompts representing various Dupuytren's disease scenarios were used to assess decision-making. Surgeons and AI systems provided management recommendations, which were analyzed for concordance, rationale, and predicted outcomes. Key metrics included union accuracy, surgeon agreement, precision, recall, and F1 scores. The study also evaluated AI performance in unanimous versus non-unanimous cases and inter-AI agreements. : Gemini and ChatGPT demonstrated the highest union accuracy (86.4% and 81.8%, respectively), while Copilot showed the lowest (40.9%). Surgeon agreement was highest for Gemini (45.5%) and ChatGPT (42.4%). AI systems performed better in unanimous cases (accuracy up to 92.0%) than in non-unanimous cases (accuracy as low as 35.0%). Inter-AI agreements ranged from 75.0% (ChatGPT-Gemini) to 48.0% (DeepSeek-Copilot). Precision, recall, and F1 scores were consistently higher for ChatGPT and Gemini than for other systems. : AI systems, particularly Gemini and ChatGPT, show promise in aligning with expert surgical recommendations, especially in straightforward cases. However, significant variability exists, particularly in complex scenarios. AI should be viewed as complementary to clinical judgment, requiring further refinement and validation for integration into clinical practice.

Authors

  • Ishith Seth
    Department of Plastic Surgery Peninsula Health Melbourne Victoria Australia.
  • Gianluca Marcaccini
    Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia.
  • Kaiyang Lim
    Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia.
  • Marco Castrechini
    Plastic Surgery Unit, Department of Surgery "P. Valdoni", "Sapienza" University of Rome, 00185 Rome, Italy.
  • Roberto Cuomo
    Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, 53100 Siena, Italy.
  • Sally Kiu-Huen Ng
    Department of Plastic and Reconstructive Surgery, Austin Health, Heidelberg, VIC 3199, Australia.
  • Richard J Ross
    Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia.
  • Warren M Rozen
    Department of Plastic Surgery Peninsula Health Melbourne Victoria Australia.

Keywords

No keywords available for this article.