Publication Date

Fall 2023

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Fabio Di Troia

Second Advisor

Faranak Abri

Third Advisor

Navrati Saxena

Keywords

N-grams, Opcodes, Static Analysis, Word2Vec, Doc2Vec, FastText, SVM, RF, kNN, CNN

Abstract

Malware is a serious risk to any software application whether it is standalone or over the network. In order to protect computer systems, it is essential to detect and classify malware effectively. Modern malware classification research focuses on Machine Learning and Deep Learning techniques to identify advanced malicious software. This project explores malware classification by combining two robust methods: n-grams and word embedding. By extracting opcode n-grams, we make use of sequential nature of malware execution to identify any local patterns within the malware executable.

We use word embedding methods such as Word2Vec, Doc2Vec, and FastText to produce dense vector representations of these opcode n-grams in order to improve our feature representation. These feature extraction techniques are combined with a variety of classifiers in our experimental framework, such as Support Vector Machines (SVM), Random Forest (RF), k-Nearest Neighbors (k-NN), and Convolutional Neural Networks (CNN). With these combinations, we can investigate the advantages and disadvantages of various classifiers when it comes to malware categorization. Comparing classifiers provides important information about how well they work with different feature representations. Using this approach, we perform experiments for Multi-class classification. The findings of this research indicate that using opcode n-grams with word embedding is a promising solution to detect and classify real-world malware.

Recommended Citation

Joshi, Siddhita, "Malware Classification using Opcode N-grams and Word Embeddings" (2023). Master's Projects. 1303.
DOI: https://doi.org/10.31979/etd.azjj-6nmk
https://scholarworks.sjsu.edu/etd_projects/1303

Download

Included in

Other Computer Engineering Commons

COinS

DOI

https://doi.org/10.31979/etd.azjj-6nmk

Master's Projects

Malware Classification using Opcode N-grams and Word Embeddings

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Malware Classification using Opcode N-grams and Word Embeddings

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links