Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

Journal: JMIR medical informatics

Published Date: May 16, 2025

Abstract

BACKGROUND: The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.

Authors

Mahmud Omar

Tel-aviv university, Faculty of medicine, Tel-Aviv, Israel. Electronic address: Mahmudomar70@gmail.com.
Reem Agbareia

Ophthalmology Department, Hadassah Medical Center, Jerusalem, Israel.
Benjamin S Glicksberg

The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 770 Lexington Ave, 15th Fl, New York, NY, 10065, USA.
Girish N Nadkarni

Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Eyal Klang

Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Keywords

Benchmarking Cross-Sectional Studies Humans Large Language Models Surveys and Questionnaires

External Resources

View on PubMed Access via DOI PubMed (40378406)

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals