Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks
Publication Date
7-9-2024
Document Type
Article
Publication Title
EAI Endorsed Transactions on Internet of Things
Volume
10
DOI
10.4108/eetiot.6566
Abstract
The effectiveness of detecting malicious files heavily relies on the quality of the training dataset, particularly its size and authenticity. However, the lack of high-quality training data remains one of the biggest challenges in achieving widespread adoption of malware detection by trained machine and deep learning models. In response to this challenge, researchers have made initial strides by employing generative techniques to create synthetic malware samples. This work utilizes deep variational autoencoders (VAE) and generative adversarial networks (GAN) to produce malware samples as opcode sequences. The generated malware opcodes are then distinguished from authentic opcode samples using machine and deep learning techniques as validation methods. The primary objective of this study was to compare synthetic malware generated using VAE and GAN technologies. The results showed that neither approach could create synthetic malware that could deceive machine learning classification. However, the WGAN-GP algorithm showed more promise by requiring a higher number of synthetic malware samples in the train set to effectively be detected, proving it a better approach in synthetic malware generation.
Keywords
GAN, Malware, Synthetic Malware, VAE
Department
Computer Science
Recommended Citation
Aaron Choi, Albert Giang, Sajit Jumani, David Luong, and Fabio Di Troia. "Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks" EAI Endorsed Transactions on Internet of Things (2024). https://doi.org/10.4108/eetiot.6566