Master of Science in Data Science (MSDS)
Mathematics and Statistics
Tahir Bachar Issa
Scalability, Spectral Clustering, Unsupervised Learning
Spectral clustering has lots of advantages compared to previous more traditional clustering methods, such as k-means and Gaussian Mixture Models (GMM), and is popular since it was introduced. However, there are two major challenges, speed scalability and memory scalability, that impede the wide applications of spectral clustering. The first challenge has been addressed recently by Chen   in the special setting of sparse or low dimensional data sets. In this work, we will first review the recent study by Chen that speeds up spectral clustering. Then we will propose three new computational methods for the same special setting of sparse or low dimensional data to address the memory challenge when the data sets are too large to be fully loaded into computer memory and when the data sets are collected sequentially. Numerical experiment results will be presented to demonstrate the improvements from these methods. Based on the experiments, the proposed methods show effective results on both simulated and real-world data.
Li, Ran, "On The Memory Scalability of Spectral Clustering Algorithms" (2023). Master's Projects. 1300.