Integrating Explainable AI for Effective Malware Detection in Encrypted Network Traffic
Journal:
arXiv
Published Date:
Jan 9, 2025
Abstract
Encrypted network communication ensures confidentiality, integrity, and
privacy between endpoints. However, attackers are increasingly exploiting
encryption to conceal malicious behavior. Detecting unknown encrypted malicious
traffic without decrypting the payloads remains a significant challenge. In
this study, we investigate the integration of explainable artificial
intelligence (XAI) techniques to detect malicious network traffic. We employ
ensemble learning models to identify malicious activity using multi-view
features extracted from various aspects of encrypted communication. To
effectively represent malicious communication, we compiled a robust dataset
with 1,127 unique connections, more than any other available open-source
dataset, and spanning 54 malware families. Our models were benchmarked against
the CTU-13 dataset, achieving performance of over 99% accuracy, precision, and
F1-score. Additionally, the eXtreme Gradient Boosting (XGB) model demonstrated
99.32% accuracy, 99.53% precision, and 99.43% F1-score on our custom dataset.
By leveraging Shapley Additive Explanations (SHAP), we identified that the
maximum packet size, mean inter-arrival time of packets, and transport layer
security version used are the most critical features for the global model
explanation. Furthermore, key features were identified as important for local
explanations across both datasets for individual traffic samples. These
insights provide a deeper understanding of the model decision-making process,
enhancing the transparency and reliability of detecting malicious encrypted
traffic.