Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

Journal: JMIR medical informatics
Published Date:

Abstract

BACKGROUND: The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.

Authors

  • Mahmud Omar
    Tel-aviv university, Faculty of medicine, Tel-Aviv, Israel. Electronic address: Mahmudomar70@gmail.com.
  • Reem Agbareia
    Ophthalmology Department, Hadassah Medical Center, Jerusalem, Israel.
  • Benjamin S Glicksberg
    The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 770 Lexington Ave, 15th Fl, New York, NY, 10065, USA.
  • Girish N Nadkarni
    Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  • Eyal Klang
    Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.