Publication Date

11-1-2022

Document Type

Article

Publication Title

Applied Soft Computing

Volume

130

DOI

10.1016/j.asoc.2022.109704

Abstract

Cluster analysis is a broadly used unsupervised data analysis technique for finding groups of homogeneous units in a data set. Probabilistic distance clustering adjusted for cluster size (PDQ), discussed in this contribution, falls within the broad category of clustering methods initially developed to deal with continuous data; it has the advantage of fuzzy membership and robustness. However, a common issue in clustering deals with treating mixed-type data: continuous and categorical, which are among the most common types of data. This paper extends PDQ for mixed-type data using different dissimilarities for different kinds of variables. At first, the PDQ for mixed-type data is defined, then a simulation design shows its advantages compared to some state of the art techniques, and ultimately, it is used on a real data set. The conclusion includes some future developments.

Funding Number

18-RSG-08-046

Funding Sponsor

San José State University

Keywords

Fuzzy clustering, Mixed-type data, Probabilistic distance clustering

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Department

Mathematics and Statistics

Recommended Citation

Cristina Tortora and Francesco Palumbo. "Clustering mixed-type data using a probabilistic distance algorithm[Formula presented]" Applied Soft Computing (2022). https://doi.org/10.1016/j.asoc.2022.109704

Download

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

Clustering mixed-type data using a probabilistic distance algorithm[Formula presented]

Publication Date

Document Type

Publication Title

Volume

DOI

Abstract

Funding Number

Funding Sponsor

Keywords

Creative Commons License

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

Clustering mixed-type data using a probabilistic distance algorithm[Formula presented]

Authors

Publication Date

Document Type

Publication Title

Volume

DOI

Abstract

Funding Number

Funding Sponsor

Keywords

Creative Commons License

Department

Recommended Citation

Share

Search

Browse All

Links