Publication Date
Spring 2018
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
Abstract
DNA barcoding is a method that uses an organism’s DNA to identify its species. The gene cytochrome c oxidase I (COI) has been used effectively as a DNA barcode to identify organisms and elucidate relationships among species [1]. There also exists a database BOLD (Barcode Of Life Database) that contains COI sequences used for DNA barcoding for more than 1 million different species. Using BOLD to identify samples that have a match in the database is an uncomplicated process. However, this method fails to determine samples that are absent from the database. Given a sample that is not represented in BOLD but is similar to a represented sequence, it would be valuable to describe the sample at a higher taxonomic classification. Since COI is represented as long character sequences of amino acids, Hidden Markov Models (HMMs) can be used to associate an unknown DNA sequence with a taxonomic rank. In this work, I show that dynamically created Profile HMMs are an effective tool for such identification.
Recommended Citation
Sharma, Vishrut, "Genetic Barcode Identification With Profile Hidden Markov Models" (2018). Master's Projects. 603.
DOI: https://doi.org/10.31979/etd.9fn2-bg55
https://scholarworks.sjsu.edu/etd_projects/603