Publication Date

Summer 2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Jon Pearce

Second Advisor

Brooke Lustig

Third Advisor

Sami Khuri


secondary DNA structure switches, machine learning, regression


Ligands can bind at specific protein locations, inducing conformational changes such as those involving secondary structure. Identifying these possible switches from sequence, including homology, is an important ongoing area of research. We attempt to predict possible secondary structure switches from sequence in proteins using machine learning, specifically a logistic regression approach with 48 N-acetyltransferases as our learning set and 5 sirtuins as our test set. Validated residue binary assignments of 0 (no change in secondary structure) and 1 (change in secondary structure) were determined (DSSP) from 3D X-ray structures for sets of virtually identical chains crystallized under different conditions. Our sequence descriptors include amino acid type, six and twenty-term sequence entropy, Lobanov-Galzitskaya’s residue disorder propensity, Vkabat (variablility with respect to predictions from sequence of helix, sheet and other), and all possible combinations. We find the optimal AUC values approaching 70% for the two models of just residue disorder propensity and separately Vkabat. We hope to follow up with a larger learning set and using residue charge as an additional descriptor.