Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.
Journal:
Eye (London, England)
PMID:
39690303
Abstract
BACKGROUND/OBJECTIVE: This study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by various Large Language Models (LLMs) (ChatGPT-3.5, Gemini, Claude 3, and GPT-4.0) in the clinical context of uveitis, utilizing a meticulous grading methodology.