Handling skewness and directional tails in model-based clustering
Publication Date
8-1-2025
Document Type
Article
Publication Title
Statistical Papers
Volume
66
Issue
5
DOI
10.1007/s00362-025-01723-9
Abstract
Model-based clustering is a powerful approach used in data analysis to unveil underlying patterns or groups within a data set. However, when applied to clusters that exhibit skewness, heavy tails, or both, the classification of data points becomes more challenging. In this study, we introduce two models considering two component-wise transformations of the observed data within a mixture of multiple scaled contaminated normal (MSCN) distributions. MSCN distributions are designed to enable a different tail behavior in each dimension and directional outlier detection in the direction of the principal components. Using the transformed MSCN distributions as components of a mixture, we obtain model-based clustering techniques that allow for 1) flexible cluster shapes in terms of skewness and kurtosis and 2) component-wise and directional outlier detection. We assess the efficacy of the proposed techniques by comparing them with model-based clustering methods that perform global or component-wise outlier detection using simulated and real data sets. This comparative analysis aims to demonstrate which practical clustering scenarios using the proposed MSCN-based approaches are advantageous.
Funding Number
2209974
Funding Sponsor
European Commission
Keywords
Contaminated normal distribution, Data transformations, EM algorithm, Model-based clustering, Multiple scaled distributions
Department
Mathematics and Statistics
Recommended Citation
Cristina Tortora, Antonio Punzo, and Brian C. Franczak. "Handling skewness and directional tails in model-based clustering" Statistical Papers (2025). https://doi.org/10.1007/s00362-025-01723-9