Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Journal:
arXiv
Published Date:
May 17, 2025
Abstract
The increasing deployment of Large Vision-Language Models (LVLMs) raises
safety concerns under potential malicious inputs. However, existing multimodal
safety evaluations primarily focus on model vulnerabilities exposed by static
image inputs, ignoring the temporal dynamics of video that may induce distinct
safety risks. To bridge this gap, we introduce Video-SafetyBench, the first
comprehensive benchmark designed to evaluate the safety of LVLMs under
video-text attacks. It comprises 2,264 video-text pairs spanning 48
fine-grained unsafe categories, each pairing a synthesized video with either a
harmful query, which contains explicit malice, or a benign query, which appears
harmless but triggers harmful behavior when interpreted alongside the video. To
generate semantically accurate videos for safety evaluation, we design a
controllable pipeline that decomposes video semantics into subject images (what
is shown) and motion text (how it moves), which jointly guide the synthesis of
query-relevant videos. To effectively evaluate uncertain or borderline harmful
outputs, we propose RJScore, a novel LLM-based metric that incorporates the
confidence of judge models and human-aligned decision threshold calibration.
Extensive experiments show that benign-query video composition achieves average
attack success rates of 67.2%, revealing consistent vulnerabilities to
video-induced attacks. We believe Video-SafetyBench will catalyze future
research into video-based safety evaluation and defense strategies.