Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets
Journal:
arXiv
Published Date:
May 27, 2025
Abstract
Dataset distillation (DD) has emerged as a powerful paradigm for dataset
compression, enabling the synthesis of compact surrogate datasets that
approximate the training utility of large-scale ones. While significant
progress has been achieved in distilling image datasets, extending DD to the
video domain remains challenging due to the high dimensionality and temporal
complexity inherent in video data. Existing video distillation (VD) methods
often suffer from excessive computational costs and struggle to preserve
temporal dynamics, as na\"ive extensions of image-based approaches typically
lead to degraded performance. In this paper, we propose a novel uni-level video
dataset distillation framework that directly optimizes synthetic videos with
respect to a pre-trained model. To address temporal redundancy and enhance
motion preservation, we introduce a temporal saliency-guided filtering
mechanism that leverages inter-frame differences to guide the distillation
process, encouraging the retention of informative temporal cues while
suppressing frame-level redundancy. Extensive experiments on standard video
benchmarks demonstrate that our method achieves state-of-the-art performance,
bridging the gap between real and distilled video data and offering a scalable
solution for video dataset compression.