Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025
Journal:
arXiv
Published Date:
Jun 14, 2025
Abstract
Multimodal Large Language Models (MLLMs) have enabled transformative
advancements across diverse applications but remain susceptible to safety
threats, especially jailbreak attacks that induce harmful outputs. To
systematically evaluate and improve their safety, we organized the Adversarial
Testing & Large-model Alignment Safety Grand Challenge (ATLAS) 2025}. This
technical report presents findings from the competition, which involved 86
teams testing MLLM vulnerabilities via adversarial image-text attacks in two
phases: white-box and black-box evaluations. The competition results highlight
ongoing challenges in securing MLLMs and provide valuable guidance for
developing stronger defense mechanisms. The challenge establishes new
benchmarks for MLLM safety evaluation and lays groundwork for advancing safer
multimodal AI systems. The code and data for this challenge are openly
available at https://github.com/NY1024/ATLAS_Challenge_2025.