Evaluating ChatGPT's recommendations for systematic treatment decisions in recurrent or metastatic head and neck squamous cell carcinoma: Perspectives from experts and junior doctors.
Journal:
International journal of cancer
Published Date:
Jul 19, 2025
Abstract
This study evaluates ChatGPT-4's potential as a decision-support tool in the treatment of recurrent or metastatic head and neck squamous cell carcinoma (HNSCC). The study involved 12 retrospectively chosen patients with detailed clinical, tumor, treatment history, imaging, pathology, and symptomatic data. ChatGPT-4, along with six experts and 10 junior oncologists, assessed these cases. The AI model applied the 8th edition AJCC TNM criteria for tumor staging and proposed treatment strategies. Performance was quantitatively rated on a 0-100 scale by both expert and junior oncologists, with further analysis through statistical scoring and intraclass correlation coefficients. Findings revealed that ChatGPT-4 achieved an 83.3% accuracy rate in tumor staging with two instances of mis-staging. Junior doctors rated its staging performance highly, showing strong consensus on language capabilities and moderate on learning assistance. Experts rated ChatGPT-4's treatment strategy: high agreement on subject knowledge (median 86, mean 84.7), logical reasoning (median 83, mean 82), and analytical skills (median 85, mean 82); moderate on ChatGPT-4's usefulness for treatment decision (median 80, mean 77) and its recommendations (median 80, mean 76.8). Junior doctors rated ChatGPT-4 higher in treatment strategy (medians above 85) with limited consensus (subject knowledge: median 88, mean 84.5; logical reasoning: median 90, mean 83.2; analytical skills: median 90, mean 82.5; usefulness: median 85, mean 81.8; agreements for: median 85, mean 80.4). ChatGPT is proficient in tumor staging but moderately effective in treatment recommendations. Nonetheless, it shows promise as a supportive tool for clinicians, particularly for those with less experience, in making informed treatment decisions.
Authors
Keywords
No keywords available for this article.