Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Fabio Di Troia
Second Advisor
William Andreopoulos
Third Advisor
Sayma Akther
Keywords
Word2vec, DistilBERT, Elmo, fastText, GloVe, WGAN-GP, Dif- fusion, SMOTE, Random Forest Classifier, Support Vector Classifier, Multilayer Perceptron, T-SNE, Agglomerative Clustering
Abstract
Malware is software used to damage and disrupt computer systems with the intent to cause damage to the victim. Malware detection and classification into malware families is a crucial problem for cybersecurity researchers. One of the major bottlenecks in improving these systems is the shortage of good quality labeled malware data, especially for malware families with scarce samples. Researchers have utilized generative models to generate malware data to address this issue. Malware embeddings encode patterns within a malware file, which can be used to detect and classify malware. Recently, encouraging results have been obtained in generating malware embeddings using generative models. The experiments presented in this report aim to create high-quality malware opcode embeddings and then perform robust evaluations to assess their quality. The project seeks to generate high-quality malware embeddings that could be utilized to train malware detection and classification models.
Recommended Citation
Jain, Atishay, "Malware Opcode Embedding and Quality Assessment of Generative Sample Embeddings" (2025). Master's Projects. 1552.
DOI: https://doi.org/10.31979/etd.35r4-fh2n
https://scholarworks.sjsu.edu/etd_projects/1552