Publication Date

Spring 2021

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Fabio Di Troia

Second Advisor

William Andreopoulos

Third Advisor

Katerina Potika

Keywords

Natural Language Processing (NLP), Bidirectional Encoder Representations from Transformers (BERT), Word2Vec, Support Vector Machines (SVM), Logistic Regression, Multi-Layer Perceptron (MLP), Random Forests.

Abstract

Malware Classification is used to distinguish unique types of malware from each other.

This project aims to carry out malware classification using word embeddings which are used in Natural Language Processing (NLP) to identify and evaluate the relationship between words of a sentence. Word embeddings generated by BERT and Word2Vec for malware samples to carry out multi-class classification. BERT is a transformer based pre- trained natural language processing (NLP) model which can be used for a wide range of tasks such as question answering, paraphrase generation and next sentence prediction. However, the attention mechanism of a pre-trained BERT model can also be used in malware classification by capturing information about relation between each opcode and every other opcode belonging to a malware family. Word2Vec generates word embeddings where words with similar context will be closer. The word embeddings generated by Word2Vec would help classify malware samples belonging to a certain family based on similarity. Classification will be carried out using classifiers such as Support Vector Machines (SVM), Logistic Regression, Random Forests and Multi-Layer Perceptron (MLP). The classification accuracy of classification carried out by word embeddings generated by BERT can be compared with the accuracy of Word2Vec that would establish a baseline for results.

Recommended Citation

Alvares, Joel Lawrence, "Malware Classification with BERT" (2021). Master's Projects. 998.
DOI: https://doi.org/10.31979/etd.7n35-garb
https://scholarworks.sjsu.edu/etd_projects/998

Download

Included in

Artificial Intelligence and Robotics Commons, Information Security Commons

COinS

DOI

https://doi.org/10.31979/etd.7n35-garb

Master's Projects

Malware Classification with BERT

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Malware Classification with BERT

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links