Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention
Journal:
arXiv
Published Date:
Apr 12, 2025
Abstract
The rapid growth of social media has led to the widespread dissemination of
fake news across multiple content forms, including text, images, audio, and
video. Traditional unimodal detection methods fall short in addressing complex
cross-modal manipulations; as a result, multimodal fake news detection has
emerged as a more effective solution. However, existing multimodal approaches,
especially in the context of fake news detection on social media, often
overlook the confounders hidden within complex cross-modal interactions,
leading models to rely on spurious statistical correlations rather than genuine
causal mechanisms. In this paper, we propose the Causal Intervention-based
Multimodal Deconfounded Detection (CIMDD) framework, which systematically
models three types of confounders via a unified Structural Causal Model (SCM):
(1) Lexical Semantic Confounder (LSC); (2) Latent Visual Confounder (LVC); (3)
Dynamic Cross-Modal Coupling Confounder (DCCC). To mitigate the influence of
these confounders, we specifically design three causal modules based on
backdoor adjustment, frontdoor adjustment, and cross-modal joint intervention
to block spurious correlations from different perspectives and achieve causal
disentanglement of representations for deconfounded reasoning. Experimental
results on the FakeSV and FVC datasets demonstrate that CIMDD significantly
improves detection accuracy, outperforming state-of-the-art methods by 4.27%
and 4.80%, respectively. Furthermore, extensive experimental results indicate
that CIMDD exhibits strong generalization and robustness across diverse
multimodal scenarios.