Off-campus SJSU users: To download campus access theses, please use the following link to log into our proxy server with your SJSU library user name and PIN.

Publication Date

Spring 2022

Degree Type

Thesis - Campus Access Only

Degree Name

Master of Science (MS)


Computer Engineering


Simon Shim

Subject Areas

Computer engineering


The understanding the structure of a protein is the basis of determining it’s function.Because of this principle of Molecular Biology, a monumental amount of effort has been conducted both to experimentally determine the structures and to produce computational models that predict them. While experimentally derived structures are still the gold standard, this also proves to be a limiting factor in our understanding of biological processes as the time required for these methods is in the scale of months to years. As such, the ability to rapidly derive a protein’s structure from just it’s amino acid sequence is one feature of the grand scientific challenge known as the “Protein Folding Problem”. This problem has been a topic of research since a physics based approach was first proposed in the early 70’s. Despite long standing experimentation, the performance of computational models were unable to approach those of the slower traditional methods. However, the team at DeepMind were able to revolutionize the process with their release of the models AlphaFold and subsequent AlphaFold2. By utilizing modern deep learning architectures, these deep learning models were able to greatly outperform their competitors in the Critical Assessment of Structure Prediction (CASP) challenge. By analyzing the methodology of AlphaFold and tracking the advances of deep learning processes, it should be possible to develop even more powerful models, capable of accurate structure predictions and even the generation of synthetic data.