Enabling Weak Client Participation via On-device Knowledge Distillation in Heterogenous Federated Learning
Journal:
arXiv
Published Date:
Mar 14, 2025
Abstract
Online Knowledge Distillation (KD) is recently highlighted to train large
models in Federated Learning (FL) environments. Many existing studies adopt the
logit ensemble method to perform KD on the server side. However, they often
assume that unlabeled data collected at the edge is centralized on the server.
Moreover, the logit ensemble method personalizes local models, which can
degrade the quality of soft targets, especially when data is highly non-IID. To
address these critical limitations,we propose a novel on-device KD-based
heterogeneous FL method. Our approach leverages a small auxiliary model to
learn from labeled local data. Subsequently, a subset of clients with strong
system resources transfers knowledge to a large model through on-device KD
using their unlabeled data. Our extensive experiments demonstrate that our
on-device KD-based heterogeneous FL method effectively utilizes the system
resources of all edge devices as well as the unlabeled data, resulting in
higher accuracy compared to SOTA KD-based FL methods.