Publication Date
Fall 2023
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Fabio Di Troia
Second Advisor
Faranak Abri
Third Advisor
Navrati Saxena
Keywords
N-grams, Opcodes, Static Analysis, Word2Vec, Doc2Vec, FastText, SVM, RF, kNN, CNN
Abstract
Malware is a serious risk to any software application whether it is standalone or over the network. In order to protect computer systems, it is essential to detect and classify malware effectively. Modern malware classification research focuses on Machine Learning and Deep Learning techniques to identify advanced malicious software. This project explores malware classification by combining two robust methods: n-grams and word embedding. By extracting opcode n-grams, we make use of sequential nature of malware execution to identify any local patterns within the malware executable.
We use word embedding methods such as Word2Vec, Doc2Vec, and FastText to produce dense vector representations of these opcode n-grams in order to improve our feature representation. These feature extraction techniques are combined with a variety of classifiers in our experimental framework, such as Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Convolutional Neural Networks (CNN). With these combinations, we can investigate the advantages and disadvantages of various classifiers when it comes to malware categorization. Comparing classifiers provides important information about how well they work with different feature representations. Using this approach, we perform experiments for Multi-class classification. The findings of this research indicate that using opcode n-grams with word embedding is a promising solution to detect and classify real-world malware.
Recommended Citation
Joshi, Siddhita, "Malware Classification using Opcode N-grams and Word Embeddings" (2023). Master's Projects. 1303.
DOI: https://doi.org/10.31979/etd.azjj-6nmk
https://scholarworks.sjsu.edu/etd_projects/1303