PotentRegion4MalDetect: Advanced Features from Potential Malicious Regions for Malware Detection
Journal:
arXiv
Published Date:
Jul 9, 2025
Abstract
Malware developers exploit the fact that most detection models focus on the
entire binary to extract the feature rather than on the regions of potential
maliciousness. Therefore, they reverse engineer a benign binary and inject
malicious code into it. This obfuscation technique circumvents the malware
detection models and deceives the ML classifiers due to the prevalence of
benign features compared to malicious features. However, extracting the
features from the potential malicious regions enhances the accuracy and
decreases false positives. Hence, we propose a novel model named
PotentRegion4MalDetect that extracts features from the potential malicious
regions. PotentRegion4MalDetect determines the nodes with potential
maliciousness in the partially preprocessed Control Flow Graph (CFG) using the
malicious strings given by StringSifter. Then, it extracts advanced features of
the identified potential malicious regions alongside the features from the
completely preprocessed CFG. The features extracted from the completely
preprocessed CFG mitigate obfuscation techniques that attempt to disguise
malicious content, such as suspicious strings. The experiments reveal that the
PotentRegion4MalDetect requires fewer entries to save the features for all
binaries than the model focusing on the entire binary, reducing memory
overhead, faster computation, and lower storage requirements. These advanced
features give an 8.13% increase in SHapley Additive exPlanations (SHAP)
Absolute Mean and a 1.44% increase in SHAP Beeswarm value compared to those
extracted from the entire binary. The advanced features outperform the features
extracted from the entire binary by producing more than 99% accuracy,
precision, recall, AUC, F1-score, and 0.064% FPR.