Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.

Journal: JMIR medical education
Published Date:

Abstract

BACKGROUND: Accurate medical advice is paramount in ensuring optimal patient care, and misinformation can lead to misguided decisions with potentially detrimental health outcomes. The emergence of large language models (LLMs) such as OpenAI's GPT-4 has spurred interest in their potential health care applications, particularly in automated medical consultation. Yet, rigorous investigations comparing their performance to human experts remain sparse.

Authors

  • Eunbeen Jo
    Department of Medical Informatics, Korea University College of Medicine, Seoul, Republic of Korea.
  • Sanghoun Song
    Department of Linguistics, Korea University, Seoul, Republic of Korea. sanghoun@korea.ac.kr.
  • Jong-Ho Kim
    Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, Korea.
  • Subin Lim
    Division of Cardiology, Department of Internal Medicine, Korea University Anam Hospital, Seoul, Republic of Korea.
  • Ju Hyeon Kim
    Division of Cardiology, Department of Internal Medicine, Korea University Anam Hospital, Seoul, Republic of Korea.
  • Jung-Joon Cha
    Division of Cardiology, Department of Internal Medicine, Korea University Anam Hospital, Seoul, Republic of Korea.
  • Young-Min Kim
    College of Pharmacy, Chonnam National University, Gwangju 61186, Republic of Korea. Electronic address: u9897854@jnu.ac.kr.
  • Hyung Joon Joo
    Department of Radiology (J.Y.L., Y.W.O., S.H.H.) and Division of Cardiology, Department of Internal Medicine (D.S.L., C.W.Y., J.H.P., H.J.J.), Korea University Anam Hospital, 73 Inchon-ro, Seongbuk-gu, Seoul 02841, Republic of Korea; Department of Radiology, Korea University Guro Hospital, Seoul, Republic of Korea (H.S.Y., E.Y.K.); and Department of Radiology, Korea University Ansan Hospital, Ansan, Republic of Korea (C.K., K.Y.L.).