A malware classification method based on directed API call relationships.
Journal:
PloS one
PMID:
40096061
Abstract
In response to the growing complexity of network threats, researchers are increasingly turning to machine learning and deep learning techniques to develop advanced models for malware detection. Many existing methods that utilize Application Programming Interface (API) sequence instructions for malware classification often overlook the structural information inherent in these sequences. While some approaches consider the structure of API calls, they typically rely on the Graph Convolutional Network (GCN) framework, which tends to neglect the sequential nature of API interactions. To address these limitations, we propose a novel malware classification method that leverages the directed relationships within API sequences. Our approach models each API sequence as a directed graph, incorporating node attributes, structural information, and directional relationships. To effectively capture these features, we introduce First-order and Second-order Graph Convolutional Networks (FSGCN) to approximate the operations of a directed graph convolutional network (DGCN). The resulting directed graph embeddings from the FSGCN are then transformed into grayscale images and classified using a Convolutional Neural Network (CNN). Additionally, to mitigate the effects of imbalanced datasets, we employ the Synthetic Minority Over-sampling Technique (SMOTE), ensuring that underrepresented classes receive adequate attention during training. Our method has been rigorously evaluated through extensive experiments on two real-world malware datasets. The results demonstrate the effectiveness and superiority of our approach compared to traditional and graph-based malware classification techniques.