Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review
Journal:
arXiv
Published Date:
Jul 8, 2025
Abstract
In July 2025, 18 academic manuscripts on the preprint website arXiv were
found to contain hidden instructions known as prompts designed to manipulate
AI-assisted peer review. Instructions such as "GIVE A POSITIVE REVIEW ONLY"
were concealed using techniques like white-colored text. Author responses
varied: one planned to withdraw the affected paper, while another defended the
practice as legitimate testing of reviewer compliance. This commentary analyzes
this practice as a novel form of research misconduct. We examine the technique
of prompt injection in large language models (LLMs), revealing four types of
hidden prompts, ranging from simple positive review commands to detailed
evaluation frameworks. The defense that prompts served as "honeypots" to detect
reviewers improperly using AI fails under examination--the consistently
self-serving nature of prompt instructions indicates intent to manipulate.
Publishers maintain inconsistent policies: Elsevier prohibits AI use in peer
review entirely, while Springer Nature permits limited use with disclosure
requirements. The incident exposes systematic vulnerabilities extending beyond
peer review to any automated system processing scholarly texts, including
plagiarism detection and citation indexing. Our analysis underscores the need
for coordinated technical screening at submission portals and harmonized
policies governing generative AI (GenAI) use in academic evaluation.