Publication Date

Fall 2017

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Malware detection based on machine learning techniques is often treated as a problem specific to a particular malware family. In such cases, detection involves training and testing models for each malware family. This approach can generally achieve high accuracy, but it requires many classification steps, resulting in a slow, inefficient, and impractical process. In contrast, classifying samples as malware or be- nign based on a single model would be far more efficient. However, such an approach is extremely challenging—extracting common features from a variety of malware fam- ilies might result in a model that is too generic to be useful. In this research, we perform controlled experiments to determine the tradeoff between accuracy and the number of malware families modeled.