Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Fabio Di Troia

Second Advisor

William Andreopoulos

Third Advisor

Genya Ishigaki

Keywords

Class-imbalance, Undersampling, Oversampling, Hybrid-sampling, Generative Adversarial Networks, Multilayer Perceptron, K-Nearest Neighbors, Support Vector Machine, Random Forest

Abstract

There have been many breakthroughs over the years in the field of Machine Learning to detect and classify malware threats. However, training a holistic machine learning model to effectively classify malware has been an ongoing topic of research. Datasets represent some malware types disproportionately, which can affect the performance of machine learning classifiers. Without ample data, less common but highly dangerous malware can go undetected by classifiers, leading to devastating outcomes. Data balancing techniques have proven to be effective in representing minority classes better and lessening the bias towards the majority class. Also, recent research showed that generative modeling effectively creates synthesized data that closely resemble original data. This paper explores various balancing techniques and generates synthetic opcode sequence data to effectively train machine learning models to better classify malware. We employ oversampling, undersampling, hybrid-sampling, and Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN- GP) to generate fake data samples and compare their effectiveness in tackling the class imbalance problem in multi-class malware classification.

Recommended Citation

John, Ranjit, "Comparing Balancing Techniques for Malware Classification" (2024). Master's Projects. 1353.
DOI: https://doi.org/10.31979/etd.a56z-td5f
https://scholarworks.sjsu.edu/etd_projects/1353

Download

Included in

Other Computer Engineering Commons

COinS

DOI

https://doi.org/10.31979/etd.a56z-td5f

Master's Projects

Comparing Balancing Techniques for Malware Classification

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Comparing Balancing Techniques for Malware Classification

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links