HalLoc: Token-level Localization of Hallucinations for Vision Language Models
Journal:
arXiv
Published Date:
Jun 12, 2025
Abstract
Hallucinations pose a significant challenge to the reliability of large
vision-language models, making their detection essential for ensuring accuracy
in critical applications. Current detection methods often rely on
computationally intensive models, leading to high latency and resource demands.
Their definitive outcomes also fail to account for real-world scenarios where
the line between hallucinated and truthful information is unclear. To address
these issues, we propose HalLoc, a dataset designed for efficient,
probabilistic hallucination detection. It features 150K token-level annotated
samples, including hallucination types, across Visual Question Answering (VQA),
instruction-following, and image captioning tasks. This dataset facilitates the
development of models that detect hallucinations with graded confidence,
enabling more informed user interactions. Additionally, we introduce a baseline
model trained on HalLoc, offering low-overhead, concurrent hallucination
detection during generation. The model can be seamlessly integrated into
existing VLMs, improving reliability while preserving efficiency. The prospect
of a robust plug-and-play hallucination detection module opens new avenues for
enhancing the trustworthiness of vision-language models in real-world
applications. The HalLoc dataset and code are publicly available at:
https://github.com/dbsltm/cvpr25_halloc.