CMI: Cluster-Centric Missing Value Imputation with Feature Consistency

Publication Date

1-1-2024

Document Type

Conference Proceeding

Publication Title

2024 IEEE 14th Annual Computing and Communication Workshop and Conference, CCWC 2024

DOI

10.1109/CCWC60891.2024.10427575

First Page

521

Last Page

526

Abstract

In the realm of data analysis, addressing missing data poses a critical challenge with implications for both research and practical applications. The absence of data points in datasets can significantly undermine the reliability and performance of predictive models, potentially leading to erroneous conclusions. This paper introduces a novel approach, Cluster-Centric Missing Value Imputation (CMI), designed specifically for imputing missing values in numerical features using clustering techniques. CMI is augmented by Shapley Additive Explanations (SHAP) values to interpret feature significance post-imputation. The core principle of CMI lies in recognizing that data points within the same cluster often share similar key attributes, enhancing the transparency and understandability of the imputation process. Experimental evaluation on two medical datasets, the Indian Liver Patient Dataset (ILPD) and Chronic Kidney Disease Data (CKD), demonstrates the superior performance and interpretability of CMI compared to traditional imputation methods such as mean imputation, k-nearest neighbors (KNN) imputation, and Multiple Imputation by Chained Equations (MICE). The findings suggest that CMI represents a significant advancement in data analysis, providing an effective and interpretable solution for handling missing data in healthcare research.

Keywords

Clustering, Feature Important Analysis, Healthcare Data Analysis, Missing Value Imputation, SHAP Values

Department

Applied Data Science

Share

COinS