Publication Date
Fall 2025
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Mark Stamp
Second Advisor
Jelena Gligorijevic
Third Advisor
Katerina Potika
Keywords
Concept Drift, Model Retraining, Malware Detection, One-Class Support Vector Machines, Minibatch K-Means
Abstract
Concept drift refers to changes over time in the statistical properties of data compared to the data used to train a learning model. Machine learning models for malware detection are particularly susceptible to performance degradation due toconcept drift as attackers continually modify existing malware. We consider two unsupervised machine learning approaches to automated concept drift detection: One-Class Support Vector Machines (OCSVM) and Minibatch K-Means (MK-Means). We compare these techniques to Maximum Mean Discrepancy (MMD), a distribution shift statistical technique. We conduct experiments comparing four models (MLP, RF, SVM, XGB) on the KronoDroid malware dataset across three scenarios: static (no retraining), drift-aware (retraining when drift is detected), and periodic (constant retraining). In most cases, drift-aware retraining based on OCSVM, MK-Means, or MMD performs almost as well as periodic retraining while requiring far fewer models to retrain.
Recommended Citation
Chungata, Christofer Washington Berruz, "Selective Model Retraining for Malware Detection Using Drift Detection" (2025). Master's Projects. 1619.
DOI: https://doi.org/10.31979/etd.8nev-863b
https://scholarworks.sjsu.edu/etd_projects/1619