Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes
Publication Date
1-1-2021
Document Type
Conference Proceeding
Publication Title
Studies in Classification, Data Analysis, and Knowledge Organization
Volume
5
DOI
10.1007/978-3-030-60104-1_10
First Page
83
Last Page
91
Abstract
Benchmarking in cluster analysis is the process of analyzing which clustering techniques give the best result for different types of data structures as well as setting a standard for evaluation of newer clustering methods. There are many instances of benchmarking in cluster analysis for continuous data, but only a few for mixed-type data, i.e. data sets with nominal and continuous variables. Therefore, we explore the process for benchmarking various clustering methods on simulated mixed-type data sets with varying proportions of continuous and nominal variables. For this purpose, we test a newer clustering algorithm, KAMILA, against K-prototypes and tandem analysis where data are preprocessed using multiple correspondence analysis and then clustered using K-means, fuzzy K-means, probabilistic distance clustering (PD), and Student-t mixture models.
Keywords
K-prototypes, KAMILA, Mixed-type data clustering, Multiple correspondence analysis
Department
Mathematics and Statistics
Recommended Citation
Jarrett Jimeno, Madhumita Roy, and Cristina Tortora. "Clustering Mixed-Type Data: A Benchmark Study on KAMILA and K-Prototypes" Studies in Classification, Data Analysis, and Knowledge Organization (2021): 83-91. https://doi.org/10.1007/978-3-030-60104-1_10