Faculty Research, Scholarly, and Creative Activity

CMI: Cluster-Centric Missing Value Imputation with Feature Consistency

Megha Gupta, Alumni
Shripal Shah, Alumni
Mohammad Masum, San Jose State UniversityFollow
Sai Chandra Kosaraju, University of Nevada, Reno

Publication Date

1-1-2024

Document Type

Conference Proceeding

Publication Title

2024 IEEE 14th Annual Computing and Communication Workshop and Conference, CCWC 2024

DOI

10.1109/CCWC60891.2024.10427575

First Page

521

Last Page

526

Abstract

In the realm of data analysis, addressing missing data poses a critical challenge with implications for both research and practical applications. The absence of data points in datasets can significantly undermine the reliability and performance of predictive models, potentially leading to erroneous conclusions. This paper introduces a novel approach, Cluster-Centric Missing Value Imputation (CMI), designed specifically for imputing missing values in numerical features using clustering techniques. CMI is augmented by Shapley Additive Explanations (SHAP) values to interpret feature significance post-imputation. The core principle of CMI lies in recognizing that data points within the same cluster often share similar key attributes, enhancing the transparency and understandability of the imputation process. Experimental evaluation on two medical datasets, the Indian Liver Patient Dataset (ILPD) and Chronic Kidney Disease Data (CKD), demonstrates the superior performance and interpretability of CMI compared to traditional imputation methods such as mean imputation, k-nearest neighbors (KNN) imputation, and Multiple Imputation by Chained Equations (MICE). The findings suggest that CMI represents a significant advancement in data analysis, providing an effective and interpretable solution for handling missing data in healthcare research.

Keywords

Clustering, Feature Important Analysis, Healthcare Data Analysis, Missing Value Imputation, SHAP Values

Department

Applied Data Science

Recommended Citation

Megha Gupta, Shripal Shah, Mohammad Masum, and Sai Chandra Kosaraju. "CMI: Cluster-Centric Missing Value Imputation with Feature Consistency" 2024 IEEE 14th Annual Computing and Communication Workshop and Conference, CCWC 2024 (2024): 521-526. https://doi.org/10.1109/CCWC60891.2024.10427575

Link to Full Text

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

CMI: Cluster-Centric Missing Value Imputation with Feature Consistency

Publication Date

Document Type

Publication Title

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

CMI: Cluster-Centric Missing Value Imputation with Feature Consistency

Authors

Publication Date

Document Type

Publication Title

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links