Publication Date

Spring 2018

Degree Type

Master's Project


Computer Science


DNA barcoding is a method that uses an organism’s DNA to identify its species. The gene cytochrome c oxidase I (COI) has been used effectively as a DNA barcode to identify organisms and elucidate relationships among species [1]. There also exists a database BOLD (Barcode Of Life Database) that contains COI sequences used for DNA barcoding for more than 1 million different species. Using BOLD to identify samples that have a match in the database is an uncomplicated process. However, this method fails to determine samples that are absent from the database. Given a sample that is not represented in BOLD but is similar to a represented sequence, it would be valuable to describe the sample at a higher taxonomic classification. Since COI is represented as long character sequences of amino acids, Hidden Markov Models (HMMs) can be used to associate an unknown DNA sequence with a taxonomic rank. In this work, I show that dynamically created Profile HMMs are an effective tool for such identification.