Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

Journal: Journal of medical Internet research

Published Date: Jul 14, 2025

Abstract

BACKGROUND: Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, including medical question-answering (QA). However, individual LLMs often exhibit varying performance across different medical QA datasets. We benchmarked individual zero-shot LLMs (GPT-4, Llama2-13B, Vicuna-13B, MedLlama-13B, and MedAlpaca-13B) to assess their baseline performance. Within the benchmark, GPT-4 achieves the best 71% on MedMCQA (medical multiple-choice question answering dataset), Vicuna-13B achieves 89.5% on PubMedQA (a dataset for biomedical question answering), and MedAlpaca-13B achieves the best 70% among all, showing the potential for better performance across different tasks and highlighting the need for strategies that can harness their collective strengths. Ensemble learning methods, combining multiple models to improve overall accuracy and reliability, offer a promising approach to address this challenge.

Authors

Han Yang

Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China.
Mingchen Li

Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.
Huixue Zhou

Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.
Yongkang Xiao

Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.
Qian Fang

School of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471000, China.
Shuang Zhou

NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing 100021, PR China. Electronic address: szhoupku@gmail.com.
Rui Zhang

Department of Cardiology, Zhongda Hospital, Medical School of Southeast University, Nanjing, China.

Keywords

Ensemble Learning Humans Large Language Models Machine Learning Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (40658884)

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals