Publication Date

Spring 2023

Degree Type

Master's Project

Degree Name

Master of Science in Data Science (MSDS)

Department

Mathematics and Statistics

First Advisor

Guangliang Chen

Second Advisor

Teng Moh

Third Advisor

Tahir Bachar Issa

Keywords

Scalability, Spectral Clustering, Unsupervised Learning

Abstract

Spectral clustering has lots of advantages compared to previous more traditional clustering methods, such as k-means and Gaussian Mixture Models (GMM), and is popular since it was introduced. However, there are two major challenges, speed scalability and memory scalability, that impede the wide applications of spectral clustering. The first challenge has been addressed recently by Chen [1] [2] in the special setting of sparse or low dimensional data sets. In this work, we will first review the recent study by Chen that speeds up spectral clustering. Then we will propose three new computational methods for the same special setting of sparse or low dimensional data to address the memory challenge when the data sets are too large to be fully loaded into computer memory and when the data sets are collected sequentially. Numerical experiment results will be presented to demonstrate the improvements from these methods. Based on the experiments, the proposed methods show effective results on both simulated and real-world data.

Share

COinS