Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Bioinformatics (MSBI)

Department

Computer Science

First Advisor

Dr. William Andreopoulos

Second Advisor

Dr. Wendy Lee

Third Advisor

Dr. Abhilash Barpanda

Keywords

Missense mutations, Protein language models, ESM1v, ESM1b, Protein fitness, functional site prediction, AlphaFold2, DBSCAN

Abstract

Missense mutations can impact protein function and structure, yet their effects on protein function are difficult to predict. In this study, I compared two deep learning models, ESM1v and ESM1b, by evaluating their mutation predictions against experimental structural stability data. ESM1v showed a stronger correlation with experimental structural stability scores compared to ESM1b. A sigmoid curve was fitted to explore this relationship further. Over 100,000 mutations were identified where experimental stability differed significantly from model predictions. Many mutations that remained structurally stable experimentally but were predicted as harmful by the ESM models were frequently found at known functional sites. Structural analysis using AlphaFold2 and clustering with DBSCAN showed these mutations often grouped closely in 3D space. For example, position 188 in protein A1X283, located in a peptide-binding region, highlighted this functional significance despite structural stability. These findings demonstrate that comparing ESM1v predictions with experimental data can uncover important functional mutation sites which dosen’t necessarily affect structural stability. This integrated approach provides valuable insights for future protein engineering and disease research.

Available for download on Saturday, May 23, 2026

Share

COinS