PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models
Journal:
arXiv
Published Date:
Feb 19, 2025
Abstract
The rapid development of large language models (LLMs) is redefining the
landscape of human-computer interaction, and their integration into various
user-service applications is becoming increasingly prevalent. However,
transmitting user data to cloud-based LLMs presents significant risks of data
breaches and unauthorized access to personal identification information. In
this paper, we propose a privacy preservation pipeline for protecting privacy
and sensitive information during interactions between users and LLMs in
practical LLM usage scenarios. We construct SensitiveQA, the first privacy
open-ended question-answering dataset. It comprises 57k interactions in Chinese
and English, encompassing a diverse range of user-sensitive information within
the conversations. Our proposed solution employs a multi-stage strategy aimed
at preemptively securing user information while simultaneously preserving the
response quality of cloud-based LLMs. Experimental validation underscores our
method's efficacy in balancing privacy protection with maintaining robust
interaction quality. The code and dataset are available at
https://github.com/ligw1998/PRIV-QA.