OpCode-Based Malware Classification Using Machine Learning and Deep Learning Techniques
Journal:
arXiv
Published Date:
Apr 18, 2025
Abstract
This technical report presents a comprehensive analysis of malware
classification using OpCode sequences. Two distinct approaches are evaluated:
traditional machine learning using n-gram analysis with Support Vector Machine
(SVM), K-Nearest Neighbors (KNN), and Decision Tree classifiers; and a deep
learning approach employing a Convolutional Neural Network (CNN). The
traditional machine learning approach establishes a baseline using handcrafted
1-gram and 2-gram features from disassembled malware samples. The deep learning
methodology builds upon the work proposed in "Deep Android Malware Detection"
by McLaughlin et al. and evaluates the performance of a CNN model trained to
automatically extract features from raw OpCode data. Empirical results are
compared using standard performance metrics (accuracy, precision, recall, and
F1-score). While the SVM classifier outperforms other traditional techniques,
the CNN model demonstrates competitive performance with the added benefit of
automated feature extraction.