CMI: Cluster-Centric Missing Value Imputation with Feature Consistency
Publication Date
1-1-2024
Document Type
Conference Proceeding
Publication Title
2024 IEEE 14th Annual Computing and Communication Workshop and Conference, CCWC 2024
DOI
10.1109/CCWC60891.2024.10427575
First Page
521
Last Page
526
Abstract
In the realm of data analysis, addressing missing data poses a critical challenge with implications for both research and practical applications. The absence of data points in datasets can significantly undermine the reliability and performance of predictive models, potentially leading to erroneous conclusions. This paper introduces a novel approach, Cluster-Centric Missing Value Imputation (CMI), designed specifically for imputing missing values in numerical features using clustering techniques. CMI is augmented by Shapley Additive Explanations (SHAP) values to interpret feature significance post-imputation. The core principle of CMI lies in recognizing that data points within the same cluster often share similar key attributes, enhancing the transparency and understandability of the imputation process. Experimental evaluation on two medical datasets, the Indian Liver Patient Dataset (ILPD) and Chronic Kidney Disease Data (CKD), demonstrates the superior performance and interpretability of CMI compared to traditional imputation methods such as mean imputation, k-nearest neighbors (KNN) imputation, and Multiple Imputation by Chained Equations (MICE). The findings suggest that CMI represents a significant advancement in data analysis, providing an effective and interpretable solution for handling missing data in healthcare research.
Keywords
Clustering, Feature Important Analysis, Healthcare Data Analysis, Missing Value Imputation, SHAP Values
Department
Applied Data Science
Recommended Citation
Megha Gupta, Shripal Shah, Mohammad Masum, and Sai Chandra Kosaraju. "CMI: Cluster-Centric Missing Value Imputation with Feature Consistency" 2024 IEEE 14th Annual Computing and Communication Workshop and Conference, CCWC 2024 (2024): 521-526. https://doi.org/10.1109/CCWC60891.2024.10427575