Faculty Research, Scholarly, and Creative Activity

Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes

Jarrett Jimeno, San Jose State University
Madhumita Roy, San Jose State University
Cristina Tortora, San Jose State UniversityFollow

Publication Date

1-1-2021

Document Type

Conference Proceeding

Publication Title

Studies in Classification, Data Analysis, and Knowledge Organization

Volume

DOI

10.1007/978-3-030-60104-1_10

First Page

Last Page

Abstract

Benchmarking in cluster analysis is the process of analyzing which clustering techniques give the best result for different types of data structures as well as setting a standard for evaluation of newer clustering methods. There are many instances of benchmarking in cluster analysis for continuous data, but only a few for mixed-type data, i.e. data sets with nominal and continuous variables. Therefore, we explore the process for benchmarking various clustering methods on simulated mixed-type data sets with varying proportions of continuous and nominal variables. For this purpose, we test a newer clustering algorithm, KAMILA, against K-prototypes and tandem analysis where data are preprocessed using multiple correspondence analysis and then clustered using K-means, fuzzy K-means, probabilistic distance clustering (PD), and Student-t mixture models.

Keywords

K-prototypes, KAMILA, Mixed-type data clustering, Multiple correspondence analysis

Department

Mathematics and Statistics

Recommended Citation

Jarrett Jimeno, Madhumita Roy, and Cristina Tortora. "Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes" Studies in Classification, Data Analysis, and Knowledge Organization (2021): 83-91. https://doi.org/10.1007/978-3-030-60104-1_10

Link to Full Text

COinS

Faculty Research, Scholarly, and Creative Activity

Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes

Authors

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links