Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning
Journal:
arXiv
Published Date:
Feb 10, 2025
Abstract
Large language models (LLMs) are increasingly integrated into real-world
personalized applications through retrieval-augmented generation (RAG)
mechanisms to supplement their responses with domain-specific knowledge.
However, the valuable and often proprietary nature of the knowledge bases used
in RAG introduces the risk of unauthorized usage by adversaries. Existing
methods that can be generalized as watermarking techniques to protect these
knowledge bases typically involve poisoning or backdoor attacks. However, these
methods require altering the LLM's results of verification samples, inevitably
making these watermarks susceptible to anomaly detection and even introducing
new security risks. To address these challenges, we propose \name{} for
`harmless' copyright protection of knowledge bases. Instead of manipulating
LLM's final output, \name{} implants distinct yet benign verification behaviors
in the space of chain-of-thought (CoT) reasoning, maintaining the correctness
of the final answer. Our method has three main stages: (1) Generating CoTs: For
each verification question, we generate two `innocent' CoTs, including a target
CoT for building watermark behaviors; (2) Optimizing Watermark Phrases and
Target CoTs: Inspired by our theoretical analysis, we optimize them to minimize
retrieval errors under the \emph{black-box} and \emph{text-only} setting of
suspicious LLM, ensuring that only watermarked verification queries can
retrieve their correspondingly target CoTs contained in the knowledge base; (3)
Ownership Verification: We exploit a pairwise Wilcoxon test to verify whether a
suspicious LLM is augmented with the protected knowledge base by comparing its
responses to watermarked and benign verification queries. Our experiments on
diverse benchmarks demonstrate that \name{} effectively protects knowledge
bases and its resistance to adaptive attacks.