SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Journal:
arXiv
Published Date:
Jun 25, 2025
Abstract
Federated Prompt Learning has emerged as a communication-efficient and
privacy-preserving paradigm for adapting large vision-language models like CLIP
across decentralized clients. However, the security implications of this setup
remain underexplored. In this work, we present the first study of backdoor
attacks in Federated Prompt Learning. We show that when malicious clients
inject visually imperceptible, learnable noise triggers into input images, the
global prompt learner becomes vulnerable to targeted misclassification while
still maintaining high accuracy on clean inputs. Motivated by this
vulnerability, we propose SABRE-FL, a lightweight, modular defense that filters
poisoned prompt updates using an embedding-space anomaly detector trained
offline on out-of-distribution data. SABRE-FL requires no access to raw client
data or labels and generalizes across diverse datasets. We show, both
theoretically and empirically, that malicious clients can be reliably
identified and filtered using an embedding-based detector. Across five diverse
datasets and four baseline defenses, SABRE-FL outperforms all baselines by
significantly reducing backdoor accuracy while preserving clean accuracy,
demonstrating strong empirical performance and underscoring the need for robust
prompt learning in future federated systems.