Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

Journal: Journal of medical Internet research
Published Date:

Abstract

BACKGROUND: Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks, including medical question-answering (QA). However, individual LLMs often exhibit varying performance across different medical QA datasets. We benchmarked individual zero-shot LLMs (GPT-4, Llama2-13B, Vicuna-13B, MedLlama-13B, and MedAlpaca-13B) to assess their baseline performance. Within the benchmark, GPT-4 achieves the best 71% on MedMCQA (medical multiple-choice question answering dataset), Vicuna-13B achieves 89.5% on PubMedQA (a dataset for biomedical question answering), and MedAlpaca-13B achieves the best 70% among all, showing the potential for better performance across different tasks and highlighting the need for strategies that can harness their collective strengths. Ensemble learning methods, combining multiple models to improve overall accuracy and reliability, offer a promising approach to address this challenge.

Authors

  • Han Yang
    Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China.
  • Mingchen Li
    Division of Computational Health Sciences, Department of Surgery, University of Minnesota, Minneapolis, MN, USA.
  • Huixue Zhou
    Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.
  • Yongkang Xiao
    Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.
  • Qian Fang
    School of Mechanical and Electrical Engineering, Henan University of Science and Technology, Luoyang 471000, China.
  • Shuang Zhou
    NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing 100021, PR China. Electronic address: szhoupku@gmail.com.
  • Rui Zhang
    Department of Cardiology, Zhongda Hospital, Medical School of Southeast University, Nanjing, China.