Publication Date
Spring 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Ching-Seh Wu
Second Advisor
Navrati Saxena
Third Advisor
Thomas Austin
Keywords
Adaptive Phishing Detection, Continual Learning, Elastic Weight Consolidation, Learning Without Forgetting, Large Language Models, GPT-4o-mini
Abstract
Adaptive phishing detection remains crucial as the nature of cyber-attacks changes over time, which renders static models obsolete. This project extends phishing detection through the implementation of continual learning approaches, namely Elastic Weight Consolidation (EWC) and Learning Without Forgetting (LWF) with RoBERTa, a Large Language Model (LLM) and compares the results of these approaches against GPT-4o-mini, another LLM. Our approach begins with fine-tuning RoBERTa on multiple phishing datasets to establish an effective baseline. EWC is then implemented to preserve vital model parameters based on their importance measured by the Fisher Information Matrix, while LWF uses knowledge distillation to retain prior outputs when adapting to new data. On the other hand, GPT-4o-mini is evaluated through zero-shot prompting for classification and also used to generate synthetic phishing emails, addressing data scarcity and imbalance. A prototype Chrome extension is developed to demonstrate the usefulness of our adaptive system in real-time email filtering. In our experiments on four sequential phishing datasets from different time periods, RoBERTa + EWC preserved accuracies of 93%, 84%, 87% and 98% respectively, dramatically reducing catastrophic forgetting seen with static fine-tuning. RoBERTa + LWF delivered even better results, achieving accuracies of 96%, 85%, 97% and 99% on the same splits.
Recommended Citation
Battula, Gopi Prajeev, "PHISHING DETECTION USING CONTINUAL LEARNING AND LARGE LANGUAGE MODELS" (2025). Master's Projects. 1512.
DOI: https://doi.org/10.31979/etd.v95u-g9zz
https://scholarworks.sjsu.edu/etd_projects/1512