Multi-teacher knowledge distillation framework for lightweight anomaly detection.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Oct 30, 2025
Abstract
Anomaly detection is essential in various domains, where identifying rare and irregular samples is critical for system safety and security. This problem encounters a significant challenge due to extreme class imbalance, where normal samples vastly outnumber anomalous ones. This disparity poses difficulties for traditional learning models in effectively identifying anomalies. This paper introduces a novel framework that, for the first time, integrates knowledge distillation with multiple resampling strategies to address imbalanced learning while incorporating model compression for efficient deployment. The proposed method trains multiple teacher models on datasets resampled using diverse oversampling and undersampling techniques. By distilling knowledge from these teachers, the student model learns a balanced representation of normal and anomalous samples while maintaining a compact structure. Additionally, this paper provides a theoretical analysis showing that the proposed knowledge distillation algorithm correctly identifies class distinctions. This algorithm enhances generalization, reduces overfitting, and improves robustness in the presence of corrupted or noisy data, thereby demonstrating its practical utility in diverse and challenging conditions. Although the training process requires additional computational resources due to the multi-teacher setup, the resulting compressed student model offers significant advantages in terms of accuracy, efficiency, and inference speed, making it highly suitable for real-time anomaly detection applications. Furthermore, we have evaluated the proposed MTKD framework across six datasets, including WUSTL-EHMS, Credit-Card-Fraud, TON-IoT, KDD99, HYPERAKTIV, and ICU-IoT-Flock, covering domains such as fraud detection, intrusion detection, and healthcare monitoring, which demonstrates its domain-agnostic effectiveness in diverse real-world scenarios.
Authors
Keywords
No keywords available for this article.