Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Bioinformatics (MSBI)
Department
Computer Science
First Advisor
Dr. William Andreopoulos
Second Advisor
Dr. Wendy Lee
Third Advisor
Dr. Abhilash Barpanda
Keywords
Missense mutations, Protein language models, ESM1v, ESM1b, Protein fitness, functional site prediction, AlphaFold2, DBSCAN
Abstract
Missense mutations can impact protein function and structure, yet their effects on protein function are difficult to predict. In this study, I compared two deep learning models, ESM1v and ESM1b, by evaluating their mutation predictions against experimental structural stability data. ESM1v showed a stronger correlation with experimental structural stability scores compared to ESM1b. A sigmoid curve was fitted to explore this relationship further. Over 100,000 mutations were identified where experimental stability differed significantly from model predictions. Many mutations that remained structurally stable experimentally but were predicted as harmful by the ESM models were frequently found at known functional sites. Structural analysis using AlphaFold2 and clustering with DBSCAN showed these mutations often grouped closely in 3D space. For example, position 188 in protein A1X283, located in a peptide-binding region, highlighted this functional significance despite structural stability. These findings demonstrate that comparing ESM1v predictions with experimental data can uncover important functional mutation sites which dosen’t necessarily affect structural stability. This integrated approach provides valuable insights for future protein engineering and disease research.
Recommended Citation
Deo, Rucha, "Combining ESM models with Experimentally Derived Structural Stability to Identify Functional Missense Mutations" (2025). Master's Projects. 1509.
DOI: https://doi.org/10.31979/etd.beg4-c8uv
https://scholarworks.sjsu.edu/etd_projects/1509