Publication Date
Spring 2020
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Mark Stamp
Second Advisor
Thomas Austin
Third Advisor
Fabio Di Troia
Keywords
Hybrid malware classification, HMM, Word2vec, SVM, k-NN, random forest, deep neural networks
Abstract
Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on a wide variety of features, including opcode sequences, API calls, and byte ��-grams, among many others. In this research, we implement hybrid machine learning techniques, where we train hidden Markov models (HMM) and compute Word2Vec encodings based on opcode sequences. The resulting trained HMMs and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), ��-nearest neighbor
(��-NN), random forest (RF), and deep neural network (DNN) classifiers. We conduct substantial experiments over a variety of malware families. Our results surpass those of comparable classification experiments.
Recommended Citation
Kale, Aparna Sunil, "Malware Classification Based on Hidden Markov Model and Word2Vec Features" (2020). Master's Projects. 921.
DOI: https://doi.org/10.31979/etd.edkg-dtq8
https://scholarworks.sjsu.edu/etd_projects/921