Publication Date


Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science


Computer viruses and other forms of malware pose a threat to virtually any software system (with only a few exceptions). A computer virus is a piece of software which takes advantage of known weaknesses in a software system, and usually has the ability to deliver a malicious payload. A common technique that virus writers use to avoid detection is to enable the virus to change itself by having some kind of self-modifying code. This kind of virus is commonly known as a metamorphic virus, and can be particularly difficult to detect [17]. Existing virus detection software is continually being improved upon in order to keep up with the rising complexity of today’s modern computer viruses. A new approach to detecting metamorphic viruses, which is an extension of an idea posed in a student writing project from a previous semester [17], will be considered in this project. If a large set of viruses in one “family” of metamorphic viruses can be treated as simple sequences of op-codes, then sequence analysis techniques used in other fields of study like bioengineering [4] could be used to develop a profile hidden Markov model (HMM). This profile would then be used to score an arbitrary op-code sequence (i.e. a program which may or may not be in the virus family) – if the output score exceeds a designated threshold it could be concluded that the input sequence was likely to have been from that same virus family. One of the most common techniques to detect viruses is called signature detection, which involves an analysis of known viruses to find signatures, or strings of bytes, which are found in viruses and not in most non-malicious code. If the virus is metamorphic it could potentially be difficult to find a single signature that will consistently be found in every version of a metamorphic virus. Since a profile HMM would score the overall similarity in structure to a virus “family”, it could theoretically detect the virus even if a reliable signature cannot be created. In order to develop a profile HMM for a virus family, the first step is to create a multiple sequence alignment (MSA) for the set of family viruses; this can then be used to “train” the profile HMM. This paper will concentrate on the techniques for creating MSA’s for real world virus op-code sequences which will best match the virus family, as well as to discuss the overall plausibility of the idea of using a profile HMM to detect metamorphic viruses. Creating and testing the profile HMM to detect the viruses will be the subject of another student project.