Route-and-Aggregate Decentralized Federated Learning Under Communication Errors
Journal:
arXiv
Published Date:
Mar 28, 2025
Abstract
Decentralized federated learning (D-FL) allows clients to aggregate learning
models locally, offering flexibility and scalability. Existing D-FL methods use
gossip protocols, which are inefficient when not all nodes in the network are
D-FL clients. This paper puts forth a new D-FL strategy, termed
Route-and-Aggregate (R&A) D-FL, where participating clients exchange models
with their peers through established routes (as opposed to flooding) and
adaptively normalize their aggregation coefficients to compensate for
communication errors. The impact of routing and imperfect links on the
convergence of R&A D-FL is analyzed, revealing that convergence is minimized
when routes with the minimum end-to-end packet error rates are employed to
deliver models. Our analysis is experimentally validated through three image
classification tasks and two next-word prediction tasks, utilizing widely
recognized datasets and models. R&A D-FL outperforms the flooding-based D-FL
method in terms of training accuracy by 35% in our tested 10-client network,
and shows strong synergy between D-FL and networking. In another test with 10
D-FL clients, the training accuracy of R&A D-FL with communication errors
approaches that of the ideal C-FL without communication errors, as the number
of routing nodes (i.e., nodes that do not participate in the training of D-FL)
rises to 28.