Poster: Synthetic Malware Generation using Generative Models

Tiffany Bao, Department of Computer Science
Kylie Trousil, University of Wisconsin-La Crosse
Quang Duy Tran, San Jose State University
Fabio Di Troia, San Jose State University
Younghee Park, San Jose State University

Abstract

Malware poses significant challenges to cybersecurity, exacerbated by the scarcity of high-quality datasets. This paper explores the use of Diffusion and Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP) to generate synthetic malware samples as opcode sequences. The synthetic data, validated through classification metrics, demonstrates improved malware detection accuracy across families, particularly in addressing zero-day attacks. Results show that Diffusion achieves up to 99.6% accuracy in multi-class classification, outperforming WGAN-GP, which reaches up to 96.1%. Incorporating synthetic samples improves detection accuracy for rare malware families by up to 100%, underscoring the potential of generative models in enhancing malware detection.