Master of Science (MS)
Most commercial anti-virus software uses signature based techniques to detect whether a file is infected by a virus or not. However, signature based detection systems are unable to detect metamorphic viruses, since such viruses change their internal structure from generation to generation. Previous work has shown that hidden Markov models (HMMs) can be used to detect metamorphic viruses. In this technique, the code is disassembled and the resulting opcode sequences are used for training and detection. Due to the disassembly step, this process is not efficient enough to use when a decision has to be made in real time.
In this project, we explore whether dynamic programming can be used to speed up the process of disassembling, with minimal loss of accuracy. Dynamic programming is generally used to solve problems having two key attributes: optimal substructure and overlapping sub problems. During each iteration our algorithm reads part of the input stream from the executable file and determines assembly instructions, thus dividing problems into sub problems.
We have created a score matrix representing digraphs of the most common opcode instructions and we have implanted a dynamic program based on this scoring matrix. For various file sizes, we determine the time taken by our dynamic program and we show that our approach is significantly faster than a standard disassembler (OllyDbg). Finally, we analyze the accuracy of our results.
Shah, Abhishek, "Approximate Disassembly using Dynamic Programming" (2010). Master's Projects. 8.