Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs
Journal:
arXiv
Published Date:
Feb 16, 2025
Abstract
Multimodal Large Language Models (MLLMs) have expanded the capabilities of
traditional language models by enabling interaction through both text and
images. However, ensuring the safety of these models remains a significant
challenge, particularly in accurately identifying whether multimodal content is
safe or unsafe-a capability we term safety awareness. In this paper, we
introduce MMSafeAware, the first comprehensive multimodal safety awareness
benchmark designed to evaluate MLLMs across 29 safety scenarios with 1500
carefully curated image-prompt pairs. MMSafeAware includes both unsafe and
over-safety subsets to assess models abilities to correctly identify unsafe
content and avoid over-sensitivity that can hinder helpfulness. Evaluating nine
widely used MLLMs using MMSafeAware reveals that current models are not
sufficiently safe and often overly sensitive; for example, GPT-4V misclassifies
36.1% of unsafe inputs as safe and 59.9% of benign inputs as unsafe. We further
explore three methods to improve safety awareness-prompting-based approaches,
visual contrastive decoding, and vision-centric reasoning fine-tuning-but find
that none achieve satisfactory performance. Our findings highlight the profound
challenges in developing MLLMs with robust safety awareness, underscoring the
need for further research in this area. All the code and data will be publicly
available to facilitate future research.