Publication Date

Fall 2021

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Nada Attar

Second Advisor

Philip Heller

Third Advisor

Bilal Alsallakh


The nitrogenase iron protein (NifH) is extensively used to study nitrogen fixation, the ecologically vital process of reducing atmospheric nitrogen to a bioavailable form. The discovery rate of novel NifH sequences is high, and there is an ongoing need for software tools to mine NifH records from the GenBank repository. Since record annotations are unreliable, because they contain errors, classifiers based on sequence alone are required. The ARBitrator classifier is highly successful but must be initialized by extensive manual effort. A Deep Learning approach could substantially reduce manual intervention. However, attempts to build a character-based Deep Learning NifH classifier were unsuccessful. We hypothesized that we could generate visual representations of protein sequences and use a Convolutional Neural Network to classify the representations. Here we present the resulting classifier, which has achieved false positive and false negative rates of 0.19% and 0.22%, respectively.