Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support.
Journal:
Communications medicine
Published Date:
Aug 2, 2025
Abstract
BACKGROUND: Large language models (LLMs) show promise in clinical contexts but can generate false facts (often referred to as "hallucinations"). One subset of these errors arises from adversarial attacks, in which fabricated details embedded in prompts lead the model to produce or elaborate on the false information. We embedded fabricated content in clinical prompts to elicit adversarial hallucination attacks in multiple large language models. We quantified how often they elaborated on false details and tested whether a specialized mitigation prompt or altered temperature settings reduced errors.
Authors
Keywords
No keywords available for this article.