Publication Date
Spring 2023
Degree Type
Master's Project
Degree Name
Master of Science in Data Science (MSDS)
Department
Mathematics and Statistics
First Advisor
Guangliang Chen
Second Advisor
Teng Moh
Third Advisor
Tahir Bachar Issa
Keywords
Scalability, Spectral Clustering, Unsupervised Learning
Abstract
Spectral clustering has lots of advantages compared to previous more traditional clustering methods, such as k-means and Gaussian Mixture Models (GMM), and is popular since it was introduced. However, there are two major challenges, speed scalability and memory scalability, that impede the wide applications of spectral clustering. The first challenge has been addressed recently by Chen [1] [2] in the special setting of sparse or low dimensional data sets. In this work, we will first review the recent study by Chen that speeds up spectral clustering. Then we will propose three new computational methods for the same special setting of sparse or low dimensional data to address the memory challenge when the data sets are too large to be fully loaded into computer memory and when the data sets are collected sequentially. Numerical experiment results will be presented to demonstrate the improvements from these methods. Based on the experiments, the proposed methods show effective results on both simulated and real-world data.
Recommended Citation
Li, Ran, "On The Memory Scalability of Spectral Clustering Algorithms" (2023). Master's Projects. 1300.
DOI: https://doi.org/10.31979/etd.u8sm-ubx5
https://scholarworks.sjsu.edu/etd_projects/1300