Author

Aniket Mishra

Publication Date

Fall 2024

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Mark Stamp

Second Advisor

Genya Ishigaki

Third Advisor

Fabio Di Troia

Keywords

Concept drift, malware detection, clustering, MiniBatch 𝐾-Means, silhouette coefficient, KronoDroid dataset, Android malware, XGBoost, Random Forests, MLP, Linear SVM

Abstract

The rapid evolution of malware presents significant challenges for detection systems. This is due to malware families adapting through feature manipulation and obfuscation, which causes concept drift. A clustering based approach is used to detect and adapt to these shifts. The KronoDroid dataset is segmented into batch sizes of 50 and analyzed with MiniBatch K-Means clustering. The silhouette coefficient is used to evaluate clustering quality, and help identify drift by detecting significant changes in cluster patterns. Concept drift will cause retraining of supervised classifiers, including Linear SVM, RF, MLP, and XGBoost. Three scenarios are used: static models, periodic retraining, and drift-aware retraining. Results show that drift-aware retraining has the highest accuracy. This research shows the benefit of combining unsupervised clustering with supervised learning and helps enhance malware detection systems.

Available for download on Wednesday, December 31, 2025

Share

COinS