Dataset-centric evaluation of federated intrusion detection models in IoT networks.

Journal: Scientific reports
Published Date:

Abstract

Intrusion detection systems (IDS) leveraging federated learning (FL) are increasingly deployed in Internet of Things (IoT) environments to address distributed data and privacy constraints. However, generalization remains unclear because most evaluations rely on a single dataset, which risks overfitting to site-specific traffic, label taxonomies, and non-IID client mixtures. This study provides a comprehensive dataset-centric evaluation of FL-based IDS across three contemporary IoT/IIoT datasets: Edge-IIoTset (2022), CIC-IoT2023, and TII-SSRC-23 (2023), that differ in devices, feature distributions, and attack families. We benchmark three FL aggregation algorithms (FedAvg, FedProx, FedNova) with two deep learning backbones (LSTM and Transformer) to assess detection accuracy, cross-environment generalizability, convergence behavior, and communication cost. Methodologically, we construct non-IID clients by device or application type, harmonize labels to a common family-level schema, align features to the intersection set, and evaluate three regimes: in-domain, cross-dataset, and a combined multi-dataset federation. Results show that federated models approach centralized performance in-domain, with macro-F1 up to 98% and accuracies in the 92-98% range. Transformers consistently exceed LSTM by ≈1-2% points in macro-F1 at comparable communication budgets. Cross-dataset tests expose substantial degradation, with up to 30 percentage-point macro-F1 loss when models face unseen environments, underscoring the need for diverse training coverage. Combined multi-dataset federation substantially restores transfer, yielding ≈90% macro-F1 across datasets in the harmonized family-level setting. Under heterogeneous clients, FedProx improves stability by reducing round-to-round variance, while FedNova achieves target accuracy in fewer rounds and lowers communication by ≈15-25% relative to FedAvg. These findings indicate a practical recipe for deployment: prioritize attack and environment diversity through combined-dataset FL, select Transformer backbones where feasible, and use FedProx or FedNova to stabilize training and reduce communication in bandwidth-constrained IoT settings.

Authors

  • Muhammad Ahmad Bilal
    Department of Computer Software Engineering, Military College of Signals, National University of Sciences and Technology, Islamabad, Pakistan.
  • Ihtesham Ul Islam
    Department of Computer Software Engineering, Military College of Signals, National University of Sciences and Technology, Islamabad, Pakistan.
  • Sarmad Idrees
    Department of Information Security, Military College of Signals, National University of Sciences and Technology, Islamabad, Pakistan.
  • Muhammad Qasim
    Microelement Research Center, College of Resources and Environment, Huazhong Agricultural University, Wuhan, Hubei-40070, China. Electronic address: [email protected].
  • Muhammad Junaid Khan
    Department of Electrical Engineering, Military College of Signals, National University of Sciences and Technology, Islamabad, Pakistan.
  • Jaleed Khan
    Medical Sciences Division, University of Oxford, Oxford, Oxfordshire, OX3 9DU, UK. [email protected].

Keywords

No keywords available for this article.