Don't Fight Hallucinations, Use Them: Estimating Image Realism using NLI over Atomic Facts
Journal:
arXiv
Published Date:
Mar 20, 2025
Abstract
Quantifying the realism of images remains a challenging problem in the field
of artificial intelligence. For example, an image of Albert Einstein holding a
smartphone violates common-sense because modern smartphone were invented after
Einstein's death. We introduce a novel method for assessing image realism using
Large Vision-Language Models (LVLMs) and Natural Language Inference (NLI). Our
approach is based on the premise that LVLMs may generate hallucinations when
confronted with images that defy common sense. Using LVLM to extract atomic
facts from these images, we obtain a mix of accurate facts and erroneous
hallucinations. We proceed by calculating pairwise entailment scores among
these facts, subsequently aggregating these values to yield a singular reality
score. This process serves to identify contradictions between genuine facts and
hallucinatory elements, signaling the presence of images that violate common
sense. Our approach has achieved a new state-of-the-art performance in
zero-shot mode on the WHOOPS! dataset.