Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering
Journal:
arXiv
Published Date:
Sep 13, 2024
Abstract
Distributed learning has become the standard approach for training
large-scale machine learning models across private data silos. While
distributed learning enhances privacy preservation and training efficiency, it
faces critical challenges related to Byzantine robustness and communication
reduction. Existing Byzantine-robust and communication-efficient methods rely
on full gradient information either at every iteration or at certain iterations
with a probability, and they only converge to an unnecessarily large
neighborhood around the solution. Motivated by these issues, we propose a novel
Byzantine-robust and communication-efficient stochastic distributed learning
method that imposes no requirements on batch size and converges to a smaller
neighborhood around the optimal solution than all existing methods, aligning
with the theoretical lower bound. Our key innovation is leveraging Polyak
Momentum to mitigate the noise caused by both biased compressors and stochastic
gradients, thus defending against Byzantine workers under information
compression. We provide proof of tight complexity bounds for our algorithm in
the context of non-convex smooth loss functions, demonstrating that these
bounds match the lower bounds in Byzantine-free scenarios. Finally, we validate
the practical significance of our algorithm through an extensive series of
experiments, benchmarking its performance on both binary classification and
image classification tasks.