Publication Date

1-1-2022

Document Type

Conference Proceeding

Publication Title

Communications in Computer and Information Science

Volume

1683 CCIS

DOI

10.1007/978-3-031-24049-2_2

First Page

Last Page

Abstract

Signature and anomaly-based techniques are the fundamental methods to detect malware. However, in recent years this type of threat has advanced to become more complex and sophisticated, making these techniques less effective. For this reason, researchers have resorted to state-of-the-art machine learning techniques to combat the threat of information security. Nevertheless, despite the integration of the machine learning models, there is still a shortage of data in training that prevents these models from performing at their peak. In the past, generative models have been found to be highly effective at generating image-like data that are similar to the actual data distribution. In this paper, we leverage the knowledge of generative modeling on opcode sequences and aim to generate malware samples by taking advantage of the contextualized embeddings from BERT. We obtained promising results when differentiating between real and generated samples. We observe that generated malware has such similar characteristics to actual malware that the classifiers are having difficulty in distinguishing between the two, in which the classifiers falsely identify the generated malware as actual malware almost of the time.

Keywords

BERT, GAN, Malware, Malware detection, Word embedding

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Department

Computer Science

Recommended Citation

Quang Duy Tran and Fabio Di Troia. "Word Embeddings for Fake Malware Generation" Communications in Computer and Information Science (2022): 22-37. https://doi.org/10.1007/978-3-031-24049-2_2

Download

COinS

Faculty Research, Scholarly, and Creative Activity

Word Embeddings for Fake Malware Generation

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Word Embeddings for Fake Malware Generation

Authors

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Creative Commons License

Department

Recommended Citation

Share

Search

Browse All

Links