Publication Date
Spring 2019
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Sami Khuri
Second Advisor
Natalia Khuri
Third Advisor
Philip Heller
Keywords
CRISPR array detection, LSTM
Abstract
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) is a se- quence found in the DNA sequence of an organism. It provides provides immunity to the organism. Recently, it was found that the CRISPR-based immunity mechanism can be manipulated to perform genome editing. The problem is, it is hard to know the specificity of this system and in turn, making it highly specific is difficult. More re- search is required to improve this CRISPR-based genome editing. Detecting CRISPR arrays in the DNA sequence is the first step towards this research. In this work, a CRISPR array detection pipeline, CRISPRLstm, is proposed. CRISPRLstm leverages the power of artificial intelligence to improve its performance over existing CRISPR array detection programs. Why and how artificial intelligence, or specifically, Long- Short Term Memory (LSTM) models, can be used to tackle this problem effectively is explained in this report. The CRSIPR arrays detected by CRISPRLstm are in good agreement with other widely used and freely available CRISPR array detection tools. CRISPRLstm is available in form of a web-tool. It visualizes the detected CRISPR arrays in a highly interactive interface with options to view secondary structure of the repeat and spacer sequences, blast them, create sequence logos of repeat sequences, and more.
Recommended Citation
Deshmukh, Shantanu, "Detecting CRISPR Arrays Using Long-Short Term Memory Network" (2019). Master's Projects. 735.
DOI: https://doi.org/10.31979/etd.fdbk-cej6
https://scholarworks.sjsu.edu/etd_projects/735