Master of Science (MS)
The nitrogenase iron protein (NifH) is extensively used to study nitrogen fixation, the ecologically vital process of reducing atmospheric nitrogen to a bioavailable form. The discovery rate of novel NifH sequences is high, and there is an ongoing need for software tools to mine NifH records from the GenBank repository. Since record annotations are unreliable, because they contain errors, classifiers based on sequence alone are required. The ARBitrator classifier is highly successful but must be initialized by extensive manual effort. A Deep Learning approach could substantially reduce manual intervention. However, attempts to build a character-based Deep Learning NifH classifier were unsuccessful. We hypothesized that we could generate visual representations of protein sequences and use a Convolutional Neural Network to classify the representations. Here we present the resulting classifier, which has achieved false positive and false negative rates of 0.19% and 0.22%, respectively.
Rez, Amer, "Nitrogenase Iron Protein Classification using CNN Neural Network" (2021). Master's Projects. 1049.