Publication Date

11-1-2022

Document Type

Article

Publication Title

Applied Soft Computing

Volume

130

DOI

10.1016/j.asoc.2022.109704

Abstract

Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, then a simulation design shows its advantages compared to some state of the art techniques, and ultimately, it is used on a real data set. The conclusion includes some future developments.

Funding Number

18-RSG-08-046

Funding Sponsor

San José State University

Keywords

Fuzzy clustering, Mixed-type data, Probabilistic distance clustering

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Department

Mathematics and Statistics

Share

COinS