Master of Science (MS)
Fabio Di Troia
Hybrid malware classification, HMM, Word2vec, SVM, k-NN, random forest, deep neural networks
Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on a wide variety of features, including opcode sequences, API calls, and byte ��-grams, among many others. In this research, we implement hybrid machine learning techniques, where we train hidden Markov models (HMM) and compute Word2Vec encodings based on opcode sequences. The resulting trained HMMs and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), ��-nearest neighbor
(��-NN), random forest (RF), and deep neural network (DNN) classifiers. We conduct substantial experiments over a variety of malware families. Our results surpass those of comparable classification experiments.
Kale, Aparna Sunil, "Malware Classification Based on Hidden Markov Model and Word2Vec Features" (2020). Master's Projects. 921.