AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web
Journal:
arXiv
Published Date:
May 23, 2025
Abstract
Textual claims are often accompanied by images to enhance their credibility
and spread on social media, but this also raises concerns about the spread of
misinformation. Existing datasets for automated verification of image-text
claims remain limited, as they often consist of synthetic claims and lack
evidence annotations to capture the reasoning behind the verdict. In this work,
we introduce AVerImaTeC, a dataset consisting of 1,297 real-world image-text
claims. Each claim is annotated with question-answer (QA) pairs containing
evidence from the web, reflecting a decomposed reasoning regarding the verdict.
We mitigate common challenges in fact-checking datasets such as contextual
dependence, temporal leakage, and evidence insufficiency, via claim
normalization, temporally constrained evidence annotation, and a two-stage
sufficiency check. We assess the consistency of the annotation in AVerImaTeC
via inter-annotator studies, achieving a $\kappa=0.742$ on verdicts and
$74.7\%$ consistency on QA pairs. We also propose a novel evaluation method for
evidence retrieval and conduct extensive experiments to establish baselines for
verifying image-text claims using open-web evidence.