Multimodal large language models as assistance for evaluation of thyroid-associated ophthalmopathy.
Journal:
Computers in biology and medicine
Published Date:
May 1, 2025
Abstract
This study evaluated the potential of multimodal AI chatbots, specifically ChatGPT-4o, in assessing thyroid-associated ophthalmopathy (TAO) through the Clinical Activity Score (CAS). Using publicly available case reports and datasets, ChatGPT-4o was tasked with generating a web-based CAS calculator and estimating CAS from external ocular photographs. Its predictions were compared with CAS evaluations by ophthalmologists and convolutional neural network (CNN) models, including ResNet50. Receiver operating characteristic (ROC) areas under the curve (AUCs) were calculated for the assessment of active TAO (CAS ≥3). ChatGPT-4o demonstrated high accuracy, with mean absolute errors of 0.39 and 0.45 compared to reference ophthalmologist scores across two datasets, outperforming both Gemini Advanced and ResNet50 in identifying active TAO. In the preoperative and pre-treatment datasets, ChatGPT-4o achieved ROC-AUCs of 0.974 and 0.990, respectively, significantly exceeding the performance of ResNet50 (0.770 and 0.623). Both ChatGPT-4o and Customized GPTs achieved identical results, suggesting robust performance without the need for further customization. The AI chatbot effectively processed both text- and image-based inputs, providing detailed explanations for its CAS estimates and creating a user-friendly calculator for rapid and accessible TAO evaluation. ChatGPT-4o thus can offer a reliable tool for TAO assessment, outperforming traditional CNN-based models. Its ability to generate a CAS calculator without prior training or coding expertise highlights its practical utility for clinical ophthalmology. This study's limitations included a small sample size, lack of real-world validation, reliance on photos without patient metadata, and challenges in repeatability. Future studies should aim to validate its effectiveness in real-world clinical settings.