Publication Date

Spring 5-24-2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Sami Khuri

Second Advisor

Philip Heller

Third Advisor

Wendy Lee


Researchers have been working towards development of tools to facilitate regular use genome engineering techniques. In recent years, the focus of these efforts has been the Clustered Regularly Interspaced Short Palindromic Repeats(CRISPR)/CRISPR associated(Cas) systems. These systems, while found naturally in bacteria and archaea as an immunity mechanism, can be used for genome engineering in eukaryotes.

There are three major computational challenges associated with the use of CRISPR/Cas9 in genome engineering for mammals - identification of CRISPR arrays, single guide RNA design and minimizing off-target effects. This project attempts to solve the problem of single guide RNA design using a novel approach.

Researchers have been trying to solve the problem by using different machine learning classification algorithms. The algorithms have been trained to use the sequential and structural properties of single guide RNAs (sgRNAs). This project explores the use of a neural network based approach to solve the sgRNA design problem. A form of the Recurrent Neural Network (RNN) called the Long Short Term Memory (LSTM) model can be used as feature-less classification model to differentiate between functional and non-functional single guide RNAs.

The project covers different experiments conducted using Support Vector Machine and Random Forest classifiers using sequential and structural features to identify the most potent sgRNAs in a given set of input sgRNAs. It also summarizes the implementation of the LSTM model and its results, along with the cross-validation results for each of these models. Through these results, it has been observed that LSTMs perform better than existing models such as Random Forest Classifiers and

Support Vector Machines and give results comparable to existing tools.

Available for download on Sunday, May 24, 2020