Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis.

Journal: Journal of biomedical informatics

Published Date: Mar 8, 2024

Abstract

OBJECTIVE: Large language models (LLMs) such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in answering medical questions and provide direction for future research.

Authors

Qiuhong Wei

Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, Chongqing, China; Children Nutrition Research Center, Children's Hospital of Chongqing Medical University, Chongqing, China; National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child Development and Critical Disorders, Chongqing Key Laboratory of Child Neurodevelopment and Cognitive Disorders, Chongqing, China.
Zhengxiong Yao

Department of Neurology, Children's Hospital of Chongqing Medical University, Chongqing, China.
Ying Cui

Department of Medicine Chemistry, Logistics College of Chinese People's Armed Police Forces, Tianjin, 300309, China.
Bo Wei

1 Department of General Surgery, Chinese PLA General Hospital, Beijing 100853, China.
Zhezhen Jin

Mailman School of Public Health, Columbia University in the City of New York, New York, NY 10027, USA.
Ximing Xu

Department of Pharmaceutics, School of Pharmacy, Jiangsu University, Zhenjiang, People's Republic of China.

Keywords

Artificial Intelligence Communication Databases, Factual Reproducibility of Results

External Resources

View on PubMed Access via DOI PubMed (38462064)

Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals