Publication Date

Fall 2015

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

T. Y. Lin

Second Advisor

Chris Tseng

Third Advisor

Howard Ho

Keywords

DNA Species Similarity Stochastic Finite Automata

Abstract

We consider the problem of identifying similarities between different species of DNA. To do this we infer a stochastic finite automata from a given training data and compare it with a test data. The training and test data consist of DNA sequence of different species. Our method first identifies sentences in DNA. To identify sentences we read DNA sequence one character at a time, 3 characters form a codon and codons form proteins (also known as amino acid chains).Each amino acid in proteins belongs to a group. In total we have 5 groups’ polar, non-polar, acidic, basic and stop codons. A protein always starts with a start codon ATG that belongs to the group polar and ends with one of the stop codons that belongs to the group stop codon. After identifying sentences our method converts it into a symbolic representation of strings where each number represents the group to which an amino acid belongs to. We then generate a PTA tree and merge equivalent states to produce a Stochastic Finite Automata for a DNA.

In addition to producing SFA, we apply secondary storage to handle huge DNA sequences. We also explain some concepts that are necessary to understand our paper.

Recommended Citation

Shweta, Shweta, "PATTERN DISCOVERY IN DNA USING STOCHASTIC AUTOMATA" (2015). Master's Projects. 459.
DOI: https://doi.org/10.31979/etd.s3b9-kxex
https://scholarworks.sjsu.edu/etd_projects/459

Download

Included in

Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.s3b9-kxex

Master's Projects

PATTERN DISCOVERY IN DNA USING STOCHASTIC AUTOMATA

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

PATTERN DISCOVERY IN DNA USING STOCHASTIC AUTOMATA

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links