$γ$-FedHT: Stepsize-Aware Hard-Threshold Gradient Compression in Federated Learning
Journal:
arXiv
Published Date:
May 18, 2025
Abstract
Gradient compression can effectively alleviate communication bottlenecks in
Federated Learning (FL). Contemporary state-of-the-art sparse compressors, such
as Top-$k$, exhibit high computational complexity, up to
$\mathcal{O}(d\log_2{k})$, where $d$ is the number of model parameters. The
hard-threshold compressor, which simply transmits elements with absolute values
higher than a fixed threshold, is thus proposed to reduce the complexity to
$\mathcal{O}(d)$. However, the hard-threshold compression causes accuracy
degradation in FL, where the datasets are non-IID and the stepsize $\gamma$ is
decreasing for model convergence. The decaying stepsize reduces the updates and
causes the compression ratio of the hard-threshold compression to drop rapidly
to an aggressive ratio. At or below this ratio, the model accuracy has been
observed to degrade severely. To address this, we propose $\gamma$-FedHT, a
stepsize-aware low-cost compressor with Error-Feedback to guarantee
convergence. Given that the traditional theoretical framework of FL does not
consider Error-Feedback, we introduce the fundamental conversation of
Error-Feedback. We prove that $\gamma$-FedHT has the convergence rate of
$\mathcal{O}(\frac{1}{T})$ ($T$ representing total training iterations) under
$\mu$-strongly convex cases and $\mathcal{O}(\frac{1}{\sqrt{T}})$ under
non-convex cases, \textit{same as FedAVG}. Extensive experiments demonstrate
that $\gamma$-FedHT improves accuracy by up to $7.42\%$ over Top-$k$ under
equal communication traffic on various non-IID image datasets.