Publication Date

Spring 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Mark Stamp

Second Advisor

Thomas Austin

Third Advisor

Fabio Di Troia


machine learning, deep learning, malware detection


It is often claimed that the primary advantage of deep learning is that such models can continue to learn as more data is available, provided that sufficient computing power is available for training. In contrast, for other forms of machine learning it is claimed that models ‘‘saturate,’’ in the sense that no additional learning can occur beyond some point, regardless of the amount of data or computing power available. In this research, we compare the accuracy of deep learning to other forms of machine learning for malware detection, as a function of the training dataset size. We experiment with a wide variety of hyperparameters for our deep learning models, and we compare these models to results obtained using �-nearest neighbors. In these experiments, we use a subset of a large and diverse malware dataset that was collected as part of a recent research project.