Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Fabio Di Troia
Second Advisor
Mark Stamp
Third Advisor
Faranak Abri
Keywords
Dimension reduction, malware visualization, natural language pro- cessing, machine learning
Abstract
Machine learning has become a popular and powerful tool for malware analysis and detection. With the rise in popularity of natural language processing (NLP) techniques, researchers can now extract contextual embeddings from malware opcode sequences, enabling the capability to analyze hidden malware patterns and advanced code obfuscation strategies. However, unlike malware binaries, which can be directly visualized as images, these embeddings exist in high-dimensional spaces, making it difficult to observe their global patterns or spatial structures. In this paper, we propose a framework for visualizing malware embeddings in lower-dimensional space using various dimensionality reduction techniques. Our approach converts malware binaries into mnemonic opcode sequences, applies NLP models to generate embeddings, and projects these embeddings into lower-dimensional spaces for visualization. This enables us to evaluate how well different NLP techniques capture structural patterns across malware families. Experimental results show that Word2Vec outperforms BERT and GloVe in preserving both intra-family (local) and inter-family (global) structures in the reduced space. These findings are consistent with prior research highlighting Word2Vec’s effectiveness in generating meaningful malware representations. Our framework can be utilized as a visual evaluation metric that leverages low-dimensional projections to assess the quality of malware embeddings. This aids in selecting the most suitable NLP technique for capturing the structural characteristics of malware.
Recommended Citation
Tran, Quang Duy, "MAPLE: Malware Analysis through Projection of Low-dimensional Embeddings" (2025). Master's Projects. 1558.
DOI: https://doi.org/10.31979/etd.qgzg-qvz6
https://scholarworks.sjsu.edu/etd_projects/1558