Model-Based Diagnosis: Automating End-to-End Diagnosis of Network Failures
Journal:
arXiv
Published Date:
Jun 29, 2025
Abstract
Fast diagnosis and repair of enterprise network failures is critically
important since disruptions cause major business impacts. Prior works focused
on diagnosis primitives or procedures limited to a subset of the problem, such
as only data plane or only control plane faults. This paper proposes a new
paradigm, model-based network diagnosis, that provides a systematic way to
derive automated procedures for identifying the root cause of network failures,
based on reports of end-to-end user-level symptoms. The diagnosis procedures
are systematically derived from a model of packet forwarding and routing,
covering hardware, firmware, and software faults in both the data plane and
distributed control plane. These automated procedures replace and dramatically
accelerate diagnosis by an experienced human operator. Model-based diagnosis is
inspired by, leverages, and is complementary to recent work on network
verification. We have built NetDx, a proof-of-concept implementation of
model-based network diagnosis. We deployed NetDx on a new emulator of networks
consisting of P4 switches with distributed routing software. We validated the
robustness and coverage of NetDx with an automated fault injection campaign, in
which 100% of faults were diagnosed correctly. Furthermore, on a data set of 33
faults from a large cloud provider that are within the domain targeted by
NetDx, 30 are efficiently diagnosed in seconds instead of hours.