Publication Date
1-1-2022
Document Type
Conference Proceeding
Publication Title
Communications in Computer and Information Science
Volume
1536 CCIS
DOI
10.1007/978-3-030-96057-5_1
First Page
3
Last Page
21
Abstract
In the past decade, the number of malware attacks have grown considerably and, more importantly, evolved. Many researchers have successfully integrated state-of-the-art machine learning techniques to combat this ever present and rising threat to information security. However, the lack of enough data to appropriately train these machine learning models is one big challenge that is still present. Generative modelling has proven to be very efficient at generating image-like synthesized data that can match the actual data distribution. In this paper, we aim to generate malware samples as opcode sequences and attempt to differentiate them from the real ones with the goal to build fake malware data that can be used to effectively train the machine learning models. We use and compare different Generative Adversarial Networks (GAN) algorithms and Hidden Markov Models (HMM) to generate such fake samples obtaining promising results.
Keywords
Fake malware generation, GAN, HMM, Machine learning, Malware, Word embedding
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Computer Science
Recommended Citation
Harshit Trehan and Fabio Di Troia. "Fake Malware Generation Using HMM and GAN" Communications in Computer and Information Science (2022): 3-21. https://doi.org/10.1007/978-3-030-96057-5_1