Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes
Studies in Classification, Data Analysis, and Knowledge Organization
Benchmarking in cluster analysis is the process of analyzing which clustering techniques give the best result for different types of data structures as well as setting a standard for evaluation of newer clustering methods. There are many instances of benchmarking in cluster analysis for continuous data, but only a few for mixed-type data, i.e. data sets with nominal and continuous variables. Therefore, we explore the process for benchmarking various clustering methods on simulated mixed-type data sets with varying proportions of continuous and nominal variables. For this purpose, we test a newer clustering algorithm, KAMILA, against K-prototypes and tandem analysis where data are preprocessed using multiple correspondence analysis and then clustered using K-means, fuzzy K-means, probabilistic distance clustering (PD), and Student-t mixture models.
K-prototypes, KAMILA, Mixed-type data clustering, Multiple correspondence analysis
Mathematics and Statistics
Jarrett Jimeno, Madhumita Roy, and Cristina Tortora. "Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes" Studies in Classification, Data Analysis, and Knowledge Organization (2021): 83-91. https://doi.org/10.1007/978-3-030-60104-1_10