Publication Date

Spring 2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Mark Stamp

Second Advisor

Thomas Austin

Third Advisor

Fabio Di Troia


Hybrid malware classification, HMM, Word2vec, SVM, k-NN, random forest, deep neural networks


Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on a wide variety of features, including opcode sequences, API calls, and byte ��-grams, among many others. In this research, we implement hybrid machine learning techniques, where we train hidden Markov models (HMM) and compute Word2Vec encodings based on opcode sequences. The resulting trained HMMs and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), ��-nearest neighbor

(��-NN), random forest (RF), and deep neural network (DNN) classifiers. We conduct substantial experiments over a variety of malware families. Our results surpass those of comparable classification experiments.