LLMs Can Do Medical Harm: Stress-Testing Clinical Decisions Under Social Pressure
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Large language models (LLMs) are entering clinical workflows, yet their effect on clinical decisions and potential for harm are uncertain. We measured harmful decision output from an ensemble of 20 LLMs across >10 million clinical scenarios with safety or ethical dilemmas. Each case was shown under a neutral control and six Milgram-style social-pressure conditions, with or without a brief mitigation cue (“verify or escalate if unsafe”). The primary outcome was the proportion of potentially harmful responses. We used two-proportion tests/χ2 tests and confirmatory mixed-effects logistic models. Across all runs (N = 10,096,800), LLMs produced 1.18 million potentially harmful outputs (11.7%). Mitigation reduced harmful decisions from 16.6% to 10.1% (p < 0.001). When exposed to social pressure, models behaved predictably but unevenly: prompts framed as authority or responsibility transfer generated the most harmful responses, whereas control prompts, neutral and pressure-free, produced the fewest (mitigated 8.3–9.6%; unmitigated 14.3–16.0%; χ2 p < 0.001). In other words, when told what to do, or told that someone else would take responsibility, models were more likely to comply, even when the instruction was unsafe. These effects were consistent across datasets and models LLMs can generate harmful medical decisions at scale. A brief safety reminder reduces, but does not eliminate, this behavior. These results highlight the need to measure harm propensity as a core performance metric and to maintain guardrails and continuous physician oversight before integrating LLMs into clinical decision-making.