Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Katerina Potika
Second Advisor
William Andreopoulos
Third Advisor
Robert Chun
Keywords
Protein-Protein interaction, Graph Neural Networks, Relational Graph Convolutional Networks, ProtBERT, ESM-2
Abstract
Accurately predicting protein-protein interactions (PPIs) is essential for understanding cellular function and advancing biomedical discovery. We model PPIs as graphs, where nodes represent proteins and edges denote interactions. Using interaction data from the STRING database, we use two samples of it, namely the benchmark datasets—SH27K and SH148K—filtered by confidence score and annotated by interaction mode (multiple relations). In this project, we present EvoRGCN, a graph-based machine learning framework for PPI prediction that integrates both sequence-level (ESM-2 embeddings) and network-level information. We incorporate various Graph Neural Network architectures, including Graph Convolutional Networks, Graph Attention Networks, and Relational Graph Convolutional Networks. Our experiments systematically evaluate the undirected vs. the directed model, edge semantics (single-mode vs. multi-relational), and various node feature types (one-hot encodings vs. ProtBERT vs. ESM-2 embeddings) on the prediction accuracy. Results show that using ESM-2 embeddings in a directed multi-relational RGCN gives the best results. These findings highlight the effectiveness of combining graph-based learning with transformer-derived protein features and provide a scalable, interpretable framework for computational PPI prediction.
Recommended Citation
Kunder, Mohit, "EvoRGCN: Harnessing ESM-2 Evolutionary Embeddings with Relational GCNs for High-Fidelity Protein-Protein Interaction Prediction" (2025). Master's Projects. 1549.
DOI: https://doi.org/10.31979/etd.rnwu-5c22
https://scholarworks.sjsu.edu/etd_projects/1549