Backdoor Attacks Against Patch-based Mixture of Experts
Journal:
arXiv
Published Date:
May 3, 2025
Abstract
As Deep Neural Networks (DNNs) continue to require larger amounts of data and
computational power, Mixture of Experts (MoE) models have become a popular
choice to reduce computational complexity. This popularity increases the
importance of considering the security of MoE architectures. Unfortunately, the
security of models using a MoE architecture has not yet gained much attention
compared to other DNN models. In this work, we investigate the vulnerability of
patch-based MoE (pMoE) models for image classification against backdoor
attacks. We examine multiple trigger generation methods and Fine-Pruning as a
defense. To better understand a pMoE model's vulnerability to backdoor attacks,
we investigate which factors affect the model's patch selection. Our work shows
that pMoE models are highly susceptible to backdoor attacks. More precisely, we
achieve high attack success rates of up to 100% with visible triggers and a 2%
poisoning rate, whilst only having a clean accuracy drop of 1.0%. Additionally,
we show that pruning itself is ineffective as a defense but that fine-tuning
can remove the backdoor almost completely. Our results show that fine-tuning
the model for five epochs reduces the attack success rate to 2.1% whilst
sacrificing 1.4% accuracy.