Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Journal:
arXiv
Published Date:
Mar 20, 2025
Abstract
Large Language Models (LLMs) excel in text generation, reasoning, and
decision-making, enabling their adoption in high-stakes domains such as
healthcare, law, and transportation. However, their reliability is a major
concern, as they often produce plausible but incorrect responses. Uncertainty
quantification (UQ) enhances trustworthiness by estimating confidence in
outputs, enabling risk mitigation and selective prediction. However,
traditional UQ methods struggle with LLMs due to computational constraints and
decoding inconsistencies. Moreover, LLMs introduce unique uncertainty sources,
such as input ambiguity, reasoning path divergence, and decoding stochasticity,
that extend beyond classical aleatoric and epistemic uncertainty. To address
this, we introduce a new taxonomy that categorizes UQ methods based on
computational efficiency and uncertainty dimensions (input, reasoning,
parameter, and prediction uncertainty). We evaluate existing techniques, assess
their real-world applicability, and identify open challenges, emphasizing the
need for scalable, interpretable, and robust UQ approaches to enhance LLM
reliability.