Faculty Research, Scholarly, and Creative Activity

Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo

Aparna Sunil Kale, San Jose State University
Vinay Pandya, San Jose State University
Fabio Di Troia, San Jose State UniversityFollow
Mark Stamp, San Jose State UniversityFollow

Publication Date

3-1-2023

Document Type

Article

Publication Title

Journal of Computer Virology and Hacking Techniques

Volume

Issue

DOI

10.1007/s11416-022-00424-3

First Page

Last Page

Abstract

Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte n-grams, among many others. In this research, we consider opcode features and we implement machine learning techniques, where we apply word embedding techniques—specifically, Word2Vec, HMM2Vec, BERT, and ELMo—as a feature engineering step. The resulting embedding vectors are then used as features for classification algorithms. The classification algorithms that we employ are support vector machines (SVM), k-nearest neighbor (kNN), random forests (RF), and convolutional neural networks (CNN). We conduct substantial experiments involving seven malware families. Our experiments extend beyond previous related work in this field. We show that we can obtain slightly better performance than in comparable previous work, with significantly faster model training times.

Department

Computer Science

Recommended Citation

Aparna Sunil Kale, Vinay Pandya, Fabio Di Troia, and Mark Stamp. "Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo" Journal of Computer Virology and Hacking Techniques (2023): 1-16. https://doi.org/10.1007/s11416-022-00424-3

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Malware classification with Word2Vec, HMM2Vec, BERT, and ELMo

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Department

Recommended Citation

Share

Search

Browse All

Links