Publication Date

Fall 12-19-2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Mark Stamp

Second Advisor

Katerina Potika

Third Advisor

Samanvitha Basole

Abstract

A fundamental problem in malware research consists of malware detection, that is, dis- tinguishing malware samples from benign samples. This problem becomes more challeng- ing when we consider multiple malware families. A typical approach to this multi-family detection problem is to train a machine learning model for each malware family and score each sample against all models. The resulting scores are then used for classification. We refer to this approach as “cold fusion,” since we combine previously-trained models—no retraining of these base models is required when additional malware families are considered. An alternative approach is to train a single model on samples from multiple malware families. We refer to this latter approach as “hot fusion,” since we must completely retrain the model whenever an additional family is included in our training set. In this research, we compare hot fusion and cold fusion—in terms of both accuracy and efficiency—as a function of the number of malware families considered. We use features based on opcodes and a variety of machine learning techniques.

Available for download on Saturday, December 19, 2020

Share

COinS