Safety at Scale: A Comprehensive Survey of Large Model Safety
Journal:
arXiv
Published Date:
Feb 2, 2025
Abstract
The rapid advancement of large models, driven by their exceptional abilities
in learning and generalization through large-scale pre-training, has reshaped
the landscape of Artificial Intelligence (AI). These models are now
foundational to a wide range of applications, including conversational AI,
recommendation systems, autonomous driving, content generation, medical
diagnostics, and scientific discovery. However, their widespread deployment
also exposes them to significant safety risks, raising concerns about
robustness, reliability, and ethical implications. This survey provides a
systematic review of current safety research on large models, covering Vision
Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language
Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models
(DMs), and large-model-based Agents. Our contributions are summarized as
follows: (1) We present a comprehensive taxonomy of safety threats to these
models, including adversarial attacks, data poisoning, backdoor attacks,
jailbreak and prompt injection attacks, energy-latency attacks, data and model
extraction attacks, and emerging agent-specific threats. (2) We review defense
strategies proposed for each type of attacks if available and summarize the
commonly used datasets and benchmarks for safety research. (3) Building on
this, we identify and discuss the open challenges in large model safety,
emphasizing the need for comprehensive safety evaluations, scalable and
effective defense mechanisms, and sustainable data practices. More importantly,
we highlight the necessity of collective efforts from the research community
and international collaboration. Our work can serve as a useful reference for
researchers and practitioners, fostering the ongoing development of
comprehensive defense systems and platforms to safeguard AI models.