Towards a Cascaded LLM Framework for Cost-effective Human-AI Decision-Making
Journal:
arXiv
Published Date:
Jun 13, 2025
Abstract
Effective human-AI decision-making balances three key factors: the
\textit{correctness} of predictions, the \textit{cost} of knowledge and
reasoning complexity, and the confidence about whether to \textit{abstain}
automated answers or involve human experts. In this work, we present a cascaded
LLM decision framework that adaptively delegates tasks across multiple tiers of
expertise -- a base model for initial candidate answers, a more capable and
knowledgeable (but costlier) large model, and a human expert for when the model
cascade abstains. Our method proceeds in two stages. First, a deferral policy
determines whether to accept the base model's answer or regenerate it with the
large model based on the confidence score. Second, an abstention policy decides
whether the cascade model response is sufficiently certain or requires human
intervention. Moreover, we incorporate an online learning mechanism in the
framework that can leverage human feedback to improve decision quality over
time. We demonstrate this approach to general question-answering (ARC-Easy and
ARC-Challenge) and medical question-answering (MedQA and MedMCQA). Our results
show that our cascaded strategy outperforms in most cases single-model
baselines in accuracy while reducing cost and providing a principled way to
handle abstentions.