Publication Date
Fall 2024
Degree Type
Master's Project
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
First Advisor
Mark Stamp
Second Advisor
Genya Ishigaki
Third Advisor
Fabio Di Troia
Keywords
Concept drift, malware detection, clustering, MiniBatch 𝐾-Means, silhouette coefficient, KronoDroid dataset, Android malware, XGBoost, Random Forests, MLP, Linear SVM
Abstract
The rapid evolution of malware presents significant challenges for detection systems. This is due to malware families adapting through feature manipulation and obfuscation, which causes concept drift. A clustering based approach is used to detect and adapt to these shifts. The KronoDroid dataset is segmented into batch sizes of 50 and analyzed with MiniBatch K-Means clustering. The silhouette coefficient is used to evaluate clustering quality, and help identify drift by detecting significant changes in cluster patterns. Concept drift will cause retraining of supervised classifiers, including Linear SVM, RF, MLP, and XGBoost. Three scenarios are used: static models, periodic retraining, and drift-aware retraining. Results show that drift-aware retraining has the highest accuracy. This research shows the benefit of combining unsupervised clustering with supervised learning and helps enhance malware detection systems.
Recommended Citation
Mishra, Aniket, "Cluster Analysis for Concept Drift Detection in Malware" (2024). Master's Projects. 1447.
https://scholarworks.sjsu.edu/etd_projects/1447