Publication Date


Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Metamorphic virus employs code obfuscation techniques to mutate itself. It absconds from signature-based detection system by modifying internal structure without compromising original functionality. However, it has been proved that machine learning technique like Hidden Markov model (HMM) can detect such viruses with high probability. HMM is a state machine where each state observes the input data with appropriate observation probability. HMM learns statistical properties of “virus features” rather than “signatures” and relies on such statistics to detect same family virus. Each HMM is trained with variants of same family viruses that are generated by same metamorphic engine so that HMM can detect similar viruses with high probability when encountered later on. Previous HMM-based detection techniques have relied on opcode sequences which are obtained by disassembling the binary (executable) code. Such an approach is impractical, since the disassembly process is slow, and this process must be applied to each file when scanning for viruses. In this paper, we develop a practical HMM-based metamorphic virus detector. We efficiently parses a Windows PE file and generate an approximate opcode sequence which is then used for scoring against the HMM. The results show that our method produce opcode sequences effectively, eliminate timeconsuming disassembling phase, reduce training time of HMM by 70% and produce clear separation of scores between family virus and non-members.