Fast, Memory-Efficient Spectral Clustering with Cosine Similarity
Publication Date
1-1-2024
Document Type
Conference Proceeding
Publication Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume
14469 LNCS
DOI
10.1007/978-3-031-49018-7_50
First Page
700
Last Page
714
Abstract
Spectral clustering is a popular and effective method but known to face two significant challenges: scalability and out-of-sample extension. In this paper, we extend the work of Chen (ICPR 2018) on the speed scalability of spectral clustering in the setting of cosine similarity to deal with massive or online data that are too large to be fully loaded into computer memory. We start by assuming a small batch of data drawn from the full set and develop an efficient procedure that learns both the nonlinear embedding and clustering map from the sample and extends them easily to the rest of the data as they are gradually loaded. We then introduce an automatic approach to selecting the optimal value of the sample size. The combination of the two steps leads to a streamlined memory-efficient algorithm that only uses a small number of batches of data (as they become available), with memory and computational costs that are independent of the size of the data. Experiments are conducted on benchmark data sets to demonstrate the fast speed and excellent accuracy of the proposed algorithm. We conclude the paper by pointing out several future research directions.
Keywords
Cosine similarity, Memory scalability, Spectral clustering, Speed scalability
Department
Mathematics and Statistics
Recommended Citation
Ran Li and Guangliang Chen. "Fast, Memory-Efficient Spectral Clustering with Cosine Similarity" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2024): 700-714. https://doi.org/10.1007/978-3-031-49018-7_50