Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.

Journal: Journal of medical Internet research
PMID:

Abstract

BACKGROUND: Recent advancements in artificial intelligence, such as GPT-3.5 Turbo (OpenAI) and GPT-4, have demonstrated significant potential by achieving good scores on text-only United States Medical Licensing Examination (USMLE) exams and effectively answering questions from physicians. However, the ability of these models to interpret medical images remains underexplored.

Authors

  • Zhichao Yang
    Guangdong Provincial Key Laboratory of Advanced Biomaterials, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, China.
  • Zonghai Yao
    Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.
  • Mahbuba Tasmin
    College of Information and Computer Science, University of Massachusetts Amherst, Amherst, MA, United States.
  • Parth Vashisht
    College of Information and Computer Science, University of Massachusetts Amherst, Amherst, MA, United States.
  • Won Seok Jang
    Miner School of Computer & Information Sciences, University of Massachusetts Lowell, Lowell, MA.
  • Feiyun Ouyang
    Miner School of Computer & Information Sciences, University of Massachusetts Lowell, Lowell, MA, United States.
  • Beining Wang
    Institute for Artificial Intelligence, the State Key Laboratory of Intelligence Technology and Systems, Beijing National Research Center for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China.
  • David McManus
    Department of Medicine, University of Massachusetts Chan Medical School, Worcester, MA, United States.
  • Dan Berlowitz
    Department of Public Health, Zuckerberg College of Health Sciences, University of Massachusetts Lowell, Lowell, MA 01854, USA.
  • Hong Yu
    University of Massachusetts Medical School, Worcester, MA.