Publication Date

Spring 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Sami Khuri

Second Advisor

Natalia Khuri

Third Advisor

Philip Heller


CRISPR array detection, LSTM


CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) is a se- quence found in the DNA sequence of an organism. It provides provides immunity to the organism. Recently, it was found that the CRISPR-based immunity mechanism can be manipulated to perform genome editing. The problem is, it is hard to know the specificity of this system and in turn, making it highly specific is difficult. More re- search is required to improve this CRISPR-based genome editing. Detecting CRISPR arrays in the DNA sequence is the first step towards this research. In this work, a CRISPR array detection pipeline, CRISPRLstm, is proposed. CRISPRLstm leverages the power of artificial intelligence to improve its performance over existing CRISPR array detection programs. Why and how artificial intelligence, or specifically, Long- Short Term Memory (LSTM) models, can be used to tackle this problem effectively is explained in this report. The CRSIPR arrays detected by CRISPRLstm are in good agreement with other widely used and freely available CRISPR array detection tools. CRISPRLstm is available in form of a web-tool. It visualizes the detected CRISPR arrays in a highly interactive interface with options to view secondary structure of the repeat and spacer sequences, blast them, create sequence logos of repeat sequences, and more.