Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study.
Journal:
JMIR medical informatics
Published Date:
May 16, 2025
Abstract
BACKGROUND: The capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored.