DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Journal: arXiv

Published Date: Jun 11, 2025

Abstract

Large Vision-Language Models (LVLMs) have achieved impressive progress across various applications but remain vulnerable to malicious queries that exploit the visual modality. Existing alignment approaches typically fail to resist malicious queries while preserving utility on benign ones effectively. To address these challenges, we propose Deep Aligned Visual Safety Prompt (DAVSP), which is built upon two key innovations. First, we introduce the Visual Safety Prompt, which appends a trainable padding region around the input image. It preserves visual features and expands the optimization space. Second, we propose Deep Alignment, a novel approach to train the visual safety prompt through supervision in the model's activation space. It enhances the inherent ability of LVLMs to perceive malicious queries, achieving deeper alignment than prior works. Extensive experiments across five benchmarks on two representative LVLMs demonstrate that DAVSP effectively resists malicious queries while preserving benign input utility. Furthermore, DAVSP exhibits great cross-model generation ability. Ablation studies further reveal that both the Visual Safety Prompt and Deep Alignment are essential components, jointly contributing to its overall effectiveness. The code is publicly available at https://github.com/zhangyitonggg/DAVSP.

Authors

Yitong Zhang
Jia Li
Liyi Cai
Ge Li

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2506.09353v1)

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals