CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos
Journal:
arXiv
Published Date:
Mar 24, 2025
Abstract
Video Anomaly Detection (VAD) remains a fundamental yet formidable task in
the video understanding community, with promising applications in areas such as
information forensics and public safety protection. Due to the rarity and
diversity of anomalies, existing methods only use easily collected regular
events to model the inherent normality of normal spatial-temporal patterns in
an unsupervised manner. Previous studies have shown that existing unsupervised
VAD models are incapable of label-independent data offsets (e.g., scene
changes) in real-world scenarios and may fail to respond to light anomalies due
to the overgeneralization of deep neural networks. Inspired by causality
learning, we argue that there exist causal factors that can adequately
generalize the prototypical patterns of regular events and present significant
deviations when anomalous instances occur. In this regard, we propose Causal
Representation Consistency Learning (CRCL) to implicitly mine potential
scene-robust causal variable in unsupervised video normality learning.
Specifically, building on the structural causal models, we propose
scene-debiasing learning and causality-inspired normality learning to strip
away entangled scene bias in deep representations and learn causal video
normality, respectively. Extensive experiments on benchmarks validate the
superiority of our method over conventional deep representation learning.
Moreover, ablation studies and extension validation show that the CRCL can cope
with label-independent biases in multi-scene settings and maintain stable
performance with only limited training data available.