Publication Date

Spring 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Philip Heller

Second Advisor

Sami Khuri

Third Advisor

Wendy Lee

Keywords

Classification, cytochrome c oxidase subunit 1, DNA barcoding, genetic identification, profile hidden Markov models, taxonomy

Abstract

Genetic identification aims to solve the shortcomings of morphological identification. By using the cytochrome c oxidase subunit 1 (COI) gene as the Eukaryotic “barcode,” scientists hope to research species that may be morphologically ambiguous, elusive, or similarly difficult to visually identify. Current COI databases allow users to search only for existing database records. However, as the number of sequenced, potential COI genes increases, COI identification tools should ideally also be informative of novel, previously unreported sequences that may represent new species. If an unknown COI sequence does not represent a reported organism, an ideal identification tool would report taxonomic ranks to which the sequence is likely to belong. A potential solution is to dynamically create profile hidden Markov models (PHMMs): first at the genus level, then at the family level, traversing to higher taxonomic ranks until a significant score is found. This study experiments with creating PHMMs at the genus level, determining thresholds for classification, and assessing the general performance of this method and the requirements for future expansion to higher taxonomic groups. It ultimately determines that this model shows potential, but may require additional data pre-processing and may fall victim to current machine limitations.

Recommended Citation

Sheu, Jessica, "Toward On-demand Profile Hidden Markov Models for Genetic Barcode Identification" (2019). Master's Projects. 671.
DOI: https://doi.org/10.31979/etd.qg3k-5ufh
https://scholarworks.sjsu.edu/etd_projects/671

Download

Included in

Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.qg3k-5ufh

Master's Projects

Toward On-demand Profile Hidden Markov Models for Genetic Barcode Identification

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Toward On-demand Profile Hidden Markov Models for Genetic Barcode Identification

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links