Faculty Research, Scholarly, and Creative Activity

Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

Ran Li, San Jose State University
Guangliang Chen, San Jose State UniversityFollow

Publication Date

1-1-2024

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

14469 LNCS

DOI

10.1007/978-3-031-49018-7_50

First Page

700

Last Page

714

Abstract

Spectral clustering is a popular and effective method but known to face two significant challenges: scalability and out-of-sample extension. In this paper, we extend the work of Chen (ICPR 2018) on the speed scalability of spectral clustering in the setting of cosine similarity to deal with massive or online data that are too large to be fully loaded into computer memory. We start by assuming a small batch of data drawn from the full set and develop an efficient procedure that learns both the nonlinear embedding and clustering map from the sample and extends them easily to the rest of the data as they are gradually loaded. We then introduce an automatic approach to selecting the optimal value of the sample size. The combination of the two steps leads to a streamlined memory-efficient algorithm that only uses a small number of batches of data (as they become available), with memory and computational costs that are independent of the size of the data. Experiments are conducted on benchmark data sets to demonstrate the fast speed and excellent accuracy of the proposed algorithm. We conclude the paper by pointing out several future research directions.

Keywords

Cosine similarity, Memory scalability, Spectral clustering, Speed scalability

Department

Mathematics and Statistics

Recommended Citation

Ran Li and Guangliang Chen. "Fast, Memory-Efficient Spectral Clustering with Cosine Similarity" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2024): 700-714. https://doi.org/10.1007/978-3-031-49018-7_50

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Fast, Memory-Efficient Spectral Clustering with Cosine Similarity

Authors

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links