Faculty Research, Scholarly, and Creative Activity

An Advanced Malware Detection System Based on NLP to Generate Genetic Markers

Quang Duy Tran, San Jose State UniversityFollow
Jaehyun Lim, Saratoga High School
Fabio Di Troia, San Jose State UniversityFollow

Publication Date

1-1-2024

Document Type

Conference Proceeding

Publication Title

Digest of Technical Papers - IEEE International Conference on Consumer Electronics

DOI

10.1109/ICCE59016.2024.10444433

Abstract

Malware researchers are increasingly employing machine learning to counter sophisticated obfuscation techniques in malware activities. However, relying solely on machine learning models, especially with limited malware samples, is suboptimal. To enhance model effectiveness, careful data preprocessing is essential. To aid machine learning models in their learning process, we disassembled malware binary files into mnemonic opcode sequences and leveraging insights from Natural Language Processing, specifically Word2Vec and Doc2Vec. This methodology generates 'genetic markers' proficient at capturing inherent malware attributes, effectively bypassing opcode obfuscation. Using various machine learning models-Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, Multi-layer Perceptron, and Convolutional biLSTM-we achieve promising results in distinguishing among 20 discrete malware families. Our findings reveal that employing customized Word2Vec and Doc2Vec models for distinct malware family datasets enhances the precision of 'genetic markers.' This refinement boosts the learning capabilities of machine learning models, resulting in a robust detection rate of 99%.

Keywords

Embedding Vectors, Machine Learning, Malware Detection, Natural Language Processing

Department

Computer Science

Recommended Citation

Quang Duy Tran, Jaehyun Lim, and Fabio Di Troia. "An Advanced Malware Detection System Based on NLP to Generate Genetic Markers" Digest of Technical Papers - IEEE International Conference on Consumer Electronics (2024). https://doi.org/10.1109/ICCE59016.2024.10444433

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

An Advanced Malware Detection System Based on NLP to Generate Genetic Markers

Publication Date

Document Type

Publication Title

DOI

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

An Advanced Malware Detection System Based on NLP to Generate Genetic Markers

Authors

Publication Date

Document Type

Publication Title

DOI

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links