Handling skewness and directional tails in model-based clustering

Publication Date

8-1-2025

Document Type

Article

Publication Title

Statistical Papers

Volume

66

Issue

5

DOI

10.1007/s00362-025-01723-9

Abstract

Model-based clustering is a powerful approach used in data analysis to unveil underlying patterns or groups within a data set. However, when applied to clusters that exhibit skewness, heavy tails, or both, the classification of data points becomes more challenging. In this study, we introduce two models considering two component-wise transformations of the observed data within a mixture of multiple scaled contaminated normal (MSCN) distributions. MSCN distributions are designed to enable a different tail behavior in each dimension and directional outlier detection in the direction of the principal components. Using the transformed MSCN distributions as components of a mixture, we obtain model-based clustering techniques that allow for 1) flexible cluster shapes in terms of skewness and kurtosis and 2) component-wise and directional outlier detection. We assess the efficacy of the proposed techniques by comparing them with model-based clustering methods that perform global or component-wise outlier detection using simulated and real data sets. This comparative analysis aims to demonstrate which practical clustering scenarios using the proposed MSCN-based approaches are advantageous.

Funding Number

2209974

Funding Sponsor

European Commission

Keywords

Contaminated normal distribution, Data transformations, EM algorithm, Model-based clustering, Multiple scaled distributions

Department

Mathematics and Statistics

Share

COinS