A fast incremental spectral clustering algorithm with cosine similarity

Publication Date

1-1-2023

Document Type

Conference Proceeding

Publication Title

IEEE International Conference on Data Mining Workshops, ICDMW

DOI

10.1109/ICDMW60847.2023.00019

First Page

80

Last Page

88

Abstract

Spectral clustering is a popular and powerful clustering method but known to face two significant challenges: scalability and out-of-sample extension. In this paper, we extend the work of Chen (ICPR 2018) on the speed scalability of spectral clustering in the setting of cosine similarity to deal with massive or online data that are too large to be fully loaded into computer memory. We start by drawing a small batch of data from the full set and develop an efficient procedure that approximately learns from the sample both the nonlinear embedding and clustering map of spectral clustering with the cosine similarity. We then introduce an incremental approach to continuously refining them while sampling more batches of data. The combination of the two procedures leads to a streamlined memory-efficient algorithm that only uses a small number of batches of data (as they become available), with memory and computational costs that are independent of the size of the data. The final nonlinear embedding and clustering rule can be easily applied to the rest of the data as they are gradually loaded. Experiments are conducted on benchmark data to demonstrate the fast speed and good accuracy of the proposed algorithm. We conclude the paper by pointing out several future research directions.

Keywords

Cosine similarity, Incremental learning, Memory scalability, Spectral clustering, Speed scalability

Department

Mathematics and Statistics

Share

COinS