Publication Date
Summer 2019
Degree Type
Master's Project
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Jon Pearce
Second Advisor
Brooke Lustig
Third Advisor
Sami Khuri
Keywords
secondary DNA structure switches, machine learning, regression
Abstract
Ligands can bind at specific protein locations, inducing conformational changes such as those involving secondary structure. Identifying these possible switches from sequence, including homology, is an important ongoing area of research. We attempt to predict possible secondary structure switches from sequence in proteins using machine learning, specifically a logistic regression approach with 48 N-acetyltransferases as our learning set and 5 sirtuins as our test set. Validated residue binary assignments of 0 (no change in secondary structure) and 1 (change in secondary structure) were determined (DSSP) from 3D X-ray structures for sets of virtually identical chains crystallized under different conditions. Our sequence descriptors include amino acid type, six and twenty-term sequence entropy, Lobanov-Galzitskaya’s residue disorder propensity, Vkabat (variablility with respect to predictions from sequence of helix, sheet and other), and all possible combinations. We find the optimal AUC values approaching 70% for the two models of just residue disorder propensity and separately Vkabat. We hope to follow up with a larger learning set and using residue charge as an additional descriptor.
Recommended Citation
Strauss, Benjamin, "PREDICTING SWITCH-LIKE BEHAVIOR IN PROTEINS USING LOGISTIC REGRESSION ON SEQUENCE-BASED DESCRIPTORS" (2019). Master's Projects. 829.
DOI: https://doi.org/10.31979/etd.g9yf-st4y
https://scholarworks.sjsu.edu/etd_projects/829