Faculty Research, Scholarly, and Creative Activity

A natural language processing approach to Malware classification

Ritik Mehta, San Jose State UniversityFollow
Olha Jurečková, Czech Technical University in Prague
Mark Stamp, San Jose State UniversityFollow

Publication Date

3-1-2024

Document Type

Article

Publication Title

Journal of Computer Virology and Hacking Techniques

Volume

Issue

DOI

10.1007/s11416-023-00506-w

First Page

173

Last Page

184

Abstract

Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) networks. In this research, we consider a hybrid architecture, where HMMs are trained on opcode sequences, and the resulting hidden states of these trained HMMs are used as feature vectors in various classifiers. In this context, extracting the HMM hidden state sequences can be viewed as a form of feature engineering that is somewhat analogous to techniques that are commonly employed in Natural Language Processing (NLP). We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset, with an HMM-Random Forest model yielding the best results.

Department

Computer Science

Recommended Citation

Ritik Mehta, Olha Jurečková, and Mark Stamp. "A natural language processing approach to Malware classification" Journal of Computer Virology and Hacking Techniques (2024): 173-184. https://doi.org/10.1007/s11416-023-00506-w

Link to Full Text

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

A natural language processing approach to Malware classification

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

A natural language processing approach to Malware classification

Authors

Publication Date

Document Type

Publication Title

Volume

Issue

DOI

First Page

Last Page

Abstract

Department

Recommended Citation

Share

Search

Browse All

Links