Author

Mohit Kunder

Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Katerina Potika

Second Advisor

William Andreopoulos

Third Advisor

Robert Chun

Keywords

Protein-Protein interaction, Graph Neural Networks, Relational Graph Convolutional Networks, ProtBERT, ESM-2

Abstract

Accurately predicting protein-protein interactions (PPIs) is essential for understanding cellular function and advancing biomedical discovery. We model PPIs as graphs, where nodes represent proteins and edges denote interactions. Using interaction data from the STRING database, we use two samples of it, namely the benchmark datasets—SH27K and SH148K—filtered by confidence score and annotated by interaction mode (multiple relations). In this project, we present EvoRGCN, a graph-based machine learning framework for PPI prediction that integrates both sequence-level (ESM-2 embeddings) and network-level information. We incorporate various Graph Neural Network architectures, including Graph Convolutional Networks, Graph Attention Networks, and Relational Graph Convolutional Networks. Our experiments systematically evaluate the undirected vs. the directed model, edge semantics (single-mode vs. multi-relational), and various node feature types (one-hot encodings vs. ProtBERT vs. ESM-2 embeddings) on the prediction accuracy. Results show that using ESM-2 embeddings in a directed multi-relational RGCN gives the best results. These findings highlight the effectiveness of combining graph-based learning with transformer-derived protein features and provide a scalable, interpretable framework for computational PPI prediction.

Available for download on Monday, May 25, 2026

Share

COinS