Publication Date

Fall 12-21-2020

Degree Type

Master's Project

Degree Name

Master of Science in Bioinformatics (MSBI)


Computer Science

First Advisor

Wendy Lee


Most biologically active proteins of eukaryotic cells are initially synthesized in the secretory pathway as inactive precursors and require proteolytic processing to become functionally active. This process is performed by a specialized family of endogenous enzymes known as proproteases convertases (PCs). Within this family of proteases, the most notorious and well-research is furin. Found ubiquitously throughout the human body, typical furin substrates are cleaved at sites composed of paired basic amino acids, specifically at the consensus sequence, R-X-[K/R]-R↓. Furin is often exploited by many pathogens, such as enveloped viruses, for proteolytic processing and maturation of their proteins. Glycoproteins of enveloped viruses often possess the essential basic residues, arginine or lysine, at their recognition site, permitting cleavage and subsequent activation by furin. Recent biochemical research suggests the furin cleavage site encompasses about 20 residues, ranging from P14 to P6', and variations at the site impact viral pathogenicity. Thus, the prediction of furin cleavage sites of viral substrates is an attractive area of research. While prediction methods of furin cleavage sites exist, there is no virus-specific model currently available. This project describes two methods for predicting furin cleavage sites of viral envelope glycoproteins based on profile Hidden Markov Models (HMM) and logistic regression. The logistic regression model was constructed using the hydrophobicity levels of amino acid residues relative to their position at the motif site. The profile HMM predicts furin cleavage sites in independent sequences with a sensitivity of 87% and an accuracy of 89%, and the latter method achieves a sensitivity of 60% and an accuracy of 91%. A Python-based prediction tool called FindFur was designed with the profile HMM and is publicly available at