HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
Journal:
arXiv
Published Date:
Mar 1, 2025
Abstract
In the dynamic landscape of artificial intelligence, the exploration of
hallucinations within vision-language (VL) models emerges as a critical
frontier. This work delves into the intricacies of hallucinatory phenomena
exhibited by widely used image captioners, unraveling interesting patterns.
Specifically, we step upon previously introduced techniques of conceptual
counterfactual explanations to address VL hallucinations. The deterministic and
efficient nature of the employed conceptual counterfactuals backbone is able to
suggest semantically minimal edits driven by hierarchical knowledge, so that
the transition from a hallucinated caption to a non-hallucinated one is
performed in a black-box manner. HalCECE, our proposed hallucination detection
framework is highly interpretable, by providing semantically meaningful edits
apart from standalone numbers, while the hierarchical decomposition of
hallucinated concepts leads to a thorough hallucination analysis. Another
novelty tied to the current work is the investigation of role hallucinations,
being one of the first works to involve interconnections between visual
concepts in hallucination detection. Overall, HalCECE recommends an explainable
direction to the crucial field of VL hallucination detection, thus fostering
trustworthy evaluation of current and future VL systems.