Publication Date
1-1-2024
Document Type
Article
Publication Title
Computational Statistics
DOI
10.1007/s00180-024-01490-5
Abstract
Data clustering has a long history and refers to a vast range of models and methods that exploit the ever-more-performing numerical optimization algorithms and are designed to find homogeneous groups of observations in data. In this framework, the probability distance clustering (PDC) family methods offer a numerically effective alternative to model-based clustering methods and a more flexible opportunity in the framework of geometric data clustering. Given nJ-dimensional data vectors arranged in a data matrix and the number K of clusters, PDC maximizes the joint density function that is defined as the sum of the products between the distance and the probability, both of which are measured for each data vector from each center. This article shows the capabilities of the PDC family, illustrating the R package FPDclustering.
Funding Number
2209974
Funding Sponsor
National Science Foundation
Keywords
Mixed-type data, Probabilistic distance clustering, Soft clustering
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Department
Mathematics and Statistics
Recommended Citation
Cristina Tortora and Francesco Palumbo. "FPDclustering: a comprehensive R package for probabilistic distance clustering based methods" Computational Statistics (2024). https://doi.org/10.1007/s00180-024-01490-5