Publication Date

Spring 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Sami Khuri

Second Advisor

Philip Heller

Third Advisor

Wendy Lee


CRISPR, CNNs, SVMs, Logistic Regression


With advancements in the field of genome engineering, researchers have come up with potential ways for site-specific gene editing. One of the methods uses the Clustered Regularly Interspaced Short Palindromic Repeats - CRISPR-Cas technology. It consists of a Cas9 nuclease and a single guide RNA (sgRNA) that cleaves the DNA at the intended target site. However, the target genome could contain multiple potential off-target sites and cleaving an off-target site can have deleterious effects in case of gene editing in humans.

Lab based assays have been developed to test the off-target effects of guide RNAs. However, it is not feasible to scale these assays for reasons related to cost and labor. The use of Machine Learning models to compute the off-target potential makes these calculations cheaper and scalable. Both, classification as well as regression, can be used to solve this problem. In this project, we explore three classification models - Support Vector Machines (SVM), Logistic Regression and Convolutional Neural Networks (CNN).