Susceptibility of large language models to hidden nudge injection during simulated medical peer review: a quasi-experimental study.

Journal: Research integrity and peer review
Published Date:

Abstract

BACKGROUND: Generative artificial intelligence (AI) technologies might offer new possibilities for the peer review process; however, AI models' possible vulnerability to hidden nudges designed to elicit positive reviews raises concerns about manipulation susceptibility, which remains unexplored. We aimed to evaluate AI model susceptibility to hidden nudges in peer review. METHODS: This quasi-experimental study was conducted between July and December 2025. Four commercial AI models were evaluated simultaneously: GPT-4 (OpenAI), Gemini 2.5 Flash (Google), DeepSeek-V3 (DeepSeek), and Claude Opus 4 (Anthropic). We used 90 pre-print and 90 published manuscripts in critical care and cardiology to feed the AI models. All manuscripts were converted to individual Microsoft Word files, with identifying information removed, to mimic a manuscript submitted to a journal for peer review. Each manuscript underwent three independent evaluations per model using standardized prompts requesting evaluation and recommendation on whether to accept or reject it for publication. First, we evaluated the manuscript without any nudge. Second, we inserted a hidden nudge opposing the initial recommendation (e.g., a negative nudge if initially accepted). Finally, we evaluated the nudged manuscripts using a modified prompt warning about potential hidden nudges. All recommendations were categorized as accept or reject. The main outcomes were the change rates in recommendations after nudge insertion compared to initial recommendations, and after nudge insertion with the modified prompt, analyzed separately for each AI model. RESULTS: Across all AI models tested, nudge insertion led to a change in the recommendation in 84.4% of the time (608/720), with Deepseek being the most susceptible model (100% of change), followed by Gemini (97.8% of change), Chat GPT (82.8% of change) and Claude (57.2% of change). Using a specific prompt to warn AI models about potential malicious nudge injections in the manuscripts did not substantially alter the results. Recommendations were still modified in 76.8% of cases (553/720). CONCLUSIONS: In this quasi-experimental study, all tested AI models were highly susceptible to hidden nudge insertions in manuscripts during simulated peer review. Importantly, explicitly warning AI models about potential nudge injections does not meaningfully reduce their susceptibility to manipulation.

Authors

Keywords

No keywords available for this article.