Author

Ayush Nair

Publication Date

Spring 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Fabio Di Troia

Second Advisor

Robert Chun

Third Advisor

William B. Andreopoulos

Keywords

Maliciousness of URLs, SHAP, LIME

Abstract

No system has ever reached the levels of proliferation that the Internet now enjoys. It stands as the most widely spread distributed system across the globe; yet this evolution has given rise to an ever-growing wave of malintent that challenges every user and entity on the vast expanse of cyberspace. Malicious URLs loom large as vulnerabilities leaving users naked as they traverse online landscapes, but cybersecurity experts craft models with esoteric algorithms in a bid to stem this tide and shield users from cybercrime. However, peering into the decision-making corridors of these models holds key importance, it’s through understanding such cognitive landscapes that robust forward protectors for users and platforms can be erected. Machine learning models are often dubbed black boxes; but not in our case. The term black boxes is often used to describe machine learning models as their workings are hidden from view; however, this paper delves into this very topic by investigating the interpretability of machine learning models with a specific focus on their use in detecting malicious URLs. Among the various models considered, attention is paid to identifying the most effective one out of a pool of five: MLP, deep models, RandomForest Classifier, SVM, and XGBoost through SHAP and LIME techniques - hoping that this dual approach will illuminate differing aspects regarding each model’s operation. By conducting an in-depth analysis and juxtaposition between these methodologies (SHAP and LIME), it is hoped that more light will be shed on how these models work differently and where exactly one can draw precise cybersecurity decisions from.

Available for download on Friday, May 23, 2025

Share

COinS