Stratify: Rethinking Federated Learning for Non-IID Data through Balanced Sampling
Journal:
arXiv
Published Date:
Apr 18, 2025
Abstract
Federated Learning (FL) on non-independently and identically distributed
(non-IID) data remains a critical challenge, as existing approaches struggle
with severe data heterogeneity. Current methods primarily address symptoms of
non-IID by applying incremental adjustments to Federated Averaging (FedAvg),
rather than directly resolving its inherent design limitations. Consequently,
performance significantly deteriorates under highly heterogeneous conditions,
as the fundamental issue of imbalanced exposure to diverse class and feature
distributions remains unresolved. This paper introduces Stratify, a novel FL
framework designed to systematically manage class and feature distributions
throughout training, effectively tackling the root cause of non-IID challenges.
Inspired by classical stratified sampling, our approach employs a Stratified
Label Schedule (SLS) to ensure balanced exposure across labels, significantly
reducing bias and variance in aggregated gradients. Complementing SLS, we
propose a label-aware client selection strategy, restricting participation
exclusively to clients possessing data relevant to scheduled labels.
Additionally, Stratify incorporates a fine-grained, high-frequency update
scheme, accelerating convergence and further mitigating data heterogeneity. To
uphold privacy, we implement a secure client selection protocol leveraging
homomorphic encryption, enabling precise global label statistics without
disclosing sensitive client information. Extensive evaluations on MNIST,
CIFAR-10, CIFAR-100, Tiny-ImageNet, COVTYPE, PACS, and Digits-DG demonstrate
that Stratify attains performance comparable to IID baselines, accelerates
convergence, and reduces client-side computation compared to state-of-the-art
methods, underscoring its practical effectiveness in realistic federated
learning scenarios.