Hydra: An Agentic Reasoning Approach for Enhancing Adversarial Robustness and Mitigating Hallucinations in Vision-Language Models
Journal:
arXiv
Published Date:
Apr 19, 2025
Abstract
To develop trustworthy Vision-Language Models (VLMs), it is essential to
address adversarial robustness and hallucination mitigation, both of which
impact factual accuracy in high-stakes applications such as defense and
healthcare. Existing methods primarily focus on either adversarial defense or
hallucination post-hoc correction, leaving a gap in unified robustness
strategies. We introduce \textbf{Hydra}, an adaptive agentic framework that
enhances plug-in VLMs through iterative reasoning, structured critiques, and
cross-model verification, improving both resilience to adversarial
perturbations and intrinsic model errors. Hydra employs an Action-Critique
Loop, where it retrieves and critiques visual information, leveraging
Chain-of-Thought (CoT) and In-Context Learning (ICL) techniques to refine
outputs dynamically. Unlike static post-hoc correction methods, Hydra adapts to
both adversarial manipulations and intrinsic model errors, making it robust to
malicious perturbations and hallucination-related inaccuracies. We evaluate
Hydra on four VLMs, three hallucination benchmarks, two adversarial attack
strategies, and two adversarial defense methods, assessing performance on both
clean and adversarial inputs. Results show that Hydra surpasses plug-in VLMs
and state-of-the-art (SOTA) dehallucination methods, even without explicit
adversarial defenses, demonstrating enhanced robustness and factual
consistency. By bridging adversarial resistance and hallucination mitigation,
Hydra provides a scalable, training-free solution for improving the reliability
of VLMs in real-world applications.