Publication Date

Spring 2016

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

T. Y. Lin

Second Advisor

Leonard Wesley

Third Advisor

Howard Ho

Keywords

coding versus non-coding dna grammatical inference

Abstract

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.

An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.

Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.

To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly.

Recommended Citation

Cook, Cory, "DNA ANALYSIS USING GRAMMATICAL INFERENCE" (2016). Master's Projects. 493.
DOI: https://doi.org/10.31979/etd.ycca-7n7x
https://scholarworks.sjsu.edu/etd_projects/493

Download

Included in

Artificial Intelligence and Robotics Commons, Other Computer Sciences Commons

COinS

DOI

https://doi.org/10.31979/etd.ycca-7n7x

Master's Projects

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

DNA ANALYSIS USING GRAMMATICAL INFERENCE

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links