A fast incremental spectral clustering algorithm with cosine similarity
Publication Date
1-1-2023
Document Type
Conference Proceeding
Publication Title
IEEE International Conference on Data Mining Workshops, ICDMW
DOI
10.1109/ICDMW60847.2023.00019
First Page
80
Last Page
88
Abstract
Spectral clustering is a popular and powerful clustering method but known to face two significant challenges: scalability and out-of-sample extension. In this paper, we extend the work of Chen (ICPR 2018) on the speed scalability of spectral clustering in the setting of cosine similarity to deal with massive or online data that are too large to be fully loaded into computer memory. We start by drawing a small batch of data from the full set and develop an efficient procedure that approximately learns from the sample both the nonlinear embedding and clustering map of spectral clustering with the cosine similarity. We then introduce an incremental approach to continuously refining them while sampling more batches of data. The combination of the two procedures leads to a streamlined memory-efficient algorithm that only uses a small number of batches of data (as they become available), with memory and computational costs that are independent of the size of the data. The final nonlinear embedding and clustering rule can be easily applied to the rest of the data as they are gradually loaded. Experiments are conducted on benchmark data to demonstrate the fast speed and good accuracy of the proposed algorithm. We conclude the paper by pointing out several future research directions.
Keywords
Cosine similarity, Incremental learning, Memory scalability, Spectral clustering, Speed scalability
Department
Mathematics and Statistics
Recommended Citation
Ran Li and Guangliang Chen. "A fast incremental spectral clustering algorithm with cosine similarity" IEEE International Conference on Data Mining Workshops, ICDMW (2023): 80-88. https://doi.org/10.1109/ICDMW60847.2023.00019