Publication Date
11-1-2022
Document Type
Article
Publication Title
Applied Soft Computing
Volume
130
DOI
10.1016/j.asoc.2022.109704
Abstract
Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, then a simulation design shows its advantages compared to some state of the art techniques, and ultimately, it is used on a real data set. The conclusion includes some future developments.
Funding Number
18-RSG-08-046
Funding Sponsor
San José State University
Keywords
Fuzzy clustering, Mixed-type data, Probabilistic distance clustering
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Department
Mathematics and Statistics
Recommended Citation
Cristina Tortora and Francesco Palumbo. "Clustering mixed-type data using a probabilistic distance algorithm[Formula presented]" Applied Soft Computing (2022). https://doi.org/10.1016/j.asoc.2022.109704