Publication Date

Spring 2013

Degree Type


Degree Name

Master of Science (MS)




Brooke Lustig


Query-Based qualitative predictors, Sequence Homology, Solvent Accessibility Prediction

Subject Areas



Characterization of relative solvent accessibility (RSA) plays a major role in classifying a given protein residue as being on the surface or buried. This information is useful for studying protein structure and protein-protein interactions, and it is usually the first approach applied in the prediction of 3-dimensional (3D) protein structures.

Various complicated and time-consuming methods, such as machine learning, have been applied in solvent-accessibility predictions. In this thesis, we presented a simple application of linear regression methods using various sequence homology values for each residue as well as query residue qualitative predictors corresponding to each of the 20 amino acids. Initially, a fit was generated by applying linear regression to training sets with a variety of sequence homology parameters, including various sequence entropies and residue qualitative predictors. Then the coefficients generated via the training sets were applied to the test set, and, subsequently, the predicted RSA values were extracted for the test set. The qualitative predictors describe the actual query residue type (e.g., Gly) as opposed to the measures of sequence homology for the aligned subject residues. The prediction accuracies were calculated by comparing the predicted RSA values with NACCESS RSA (derived from X-ray crystallography). The utilization of qualitative predictors yielded significant prediction accuracy.