Synthetic Malware Image Generation Based on Generative Models Against Zero-Day Attacks
Abstract
Malware detection is a critical task for protecting our assets from attacks. However, traditional approaches to malware detection often struggle with limited datasets, which hinder the effectiveness of machine learning models. In this paper, we propose a malware image generation system designed to craft high-quality synthetic malware image samples, addressing the challenges posed by small datasets in malware detection. The proposed system utilizes two popular generative models, WGAN-GP and Diffusion, to generate synthetic malware images. It converts malware binary files into image files using four different color spaces: monochrome, grayscale, RGB, and CMYK. These images are then evaluated based on key performance metrics such as accuracy, precision, and recall. A cosine similarity evaluation system is employed to filter high-quality samples and enhance data quality. The experimental results demonstrate that the system generates synthetic samples that improve the performance of malware detection models. The findings indicate that synthetic image-based representations are effective for malware detection tasks.