Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.

Journal: Eye (London, England)

PMID: 39690303

Abstract

BACKGROUND/OBJECTIVE: This study aimed to evaluate the accuracy, comprehensiveness, and readability of responses generated by various Large Language Models (LLMs) (ChatGPT-3.5, Gemini, Claude 3, and GPT-4.0) in the clinical context of uveitis, utilizing a meticulous grading methodology.

Authors

Fang-Fang Zhao

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Han-Jie He

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Jia-Jian Liang

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Jingyun Cen

Shaoguan University Medical college, Shaoguan, China.
Yun Wang

Department of Anesthesiology, Beijing Friendship Hospital, Capital Medical University, Beijing, 100050, People's Republic of China.
Hongjie Lin

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Feifei Chen

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Tai-Ping Li

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Jian-Feng Yang

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Lan Chen

Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China.
Ling-Ping Cen

Guangdong Provincial Key Laboratory of Medical Immunology and Molecular Diagnostics, School of Medical Technology, Guangdong Medical University, Zhanjiang, China. cenlp@hotmail.com.

Keywords

Benchmarking Comprehension Generative Artificial Intelligence Humans Language Large Language Models Uveitis

External Resources

View on PubMed Access via DOI PubMed (39690303)

Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals