Master of Science (MS)
Chemistry; Biochemistry; Bioinformatics
The structure of a protein ultimately determines its function; therefore, knowledge of three-dimensional structure is essential for understanding its function and mechanism of action. The two most common methods for determining protein structure are x-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy. These methods are quite successful but can be very time-intensive and costly. An alternative method is protein structure prediction, where structure is computationally predicted from amino acid sequence. As opposed to x-ray crystallography and NMR spectroscopy, protein structure prediction is not encumbered by potential experimental problems. In this research, we attempted to determine if certain protein structure features, known as tertiary contacts, can improve the prediction of protein three-dimensional structure. By calculating and analyzing sequence homology and related values, it was shown that tertiary contacts, which typically are long-range amino acid interactions separated by at least 10 amino acids in sequence length, generally have lower pair averaged sequence homology-based values. From our calculations we were able to create a prediction filter based on our known literature-derived tertiary contacts of whether amino acid residues are buried or on the surface of a protein. From our tertiary contact prediction filter, it was shown that approximately 80% of the amino acid residues in our protein learning set were correctly filtered to be on the surface of a protein. These results imply that tertiary contacts are more conserved, densely packed, and less likely to be on the surface of a protein. From the tertiary contact prediction filter, we hope that tertiary contacts can be utilized in conjunction with other prediction approaches to more accurately predict where amino acids may be located in a protein.
Nguyen, Trung Thanh, "Utilization of Protein Tertiary Contacts to Improve Protein Structure Prediction Using Sequence Homology" (2012). Master's Theses. 4245.