ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction
Publication Date
1-1-2024
Document Type
Conference Proceeding
Publication Title
Proceedings - 2024 IEEE 48th Annual Computers, Software, and Applications Conference, COMPSAC 2024
DOI
10.1109/COMPSAC61105.2024.00052
First Page
320
Last Page
325
Abstract
In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenges by transforming high-dimensional data into a lower-dimensional representation while preserving maximum variance. However, PCA faces limitations in high-dimensional contexts, potentially leading to information loss and increased computational demands, particularly for sizable datasets, as PCA uses the entire dataset in the transformation process. In this paper, we propose a novel framework ActivePCA that integrates PCA and Active Machine Learning (AML) to leverage a subset of datasets in the dimension reduction process. The framework selectively identifies most informative instances from the dataset in the first step. In the second step, ActivePCA applies PCA on the selected subset of the dataset only. To demonstrate effectiveness, we applied our proposed framework to six different EHR datasets with varying dimensions. The framework significantly reduces both the number of observations and dimensions of datasets utilizing AML and PCA, respectively, resulting in improved performance from ML classifiers. ActivePCA approximately reduces 50% to 80% labeling cost on the EHR datasets compared to the original dimensions of the datasets. In addition, ActivePCA achieves significantly higher accuracy using the reduced dimensions, showing the effectiveness of AML while applying PCA.
Keywords
Active Machine Learning, Dimension Reduction, Electronic Health Records Datasets, PCA, Reduce Labeling Cost
Department
Applied Data Science
Recommended Citation
Priyanka Bhyregowda, Mohammad Masum, Lohuwa Mamudu, Mohammed Chowdhurv, Sai Chandra Kosaraiu, and Hossain Shahriar. "ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction" Proceedings - 2024 IEEE 48th Annual Computers, Software, and Applications Conference, COMPSAC 2024 (2024): 320-325. https://doi.org/10.1109/COMPSAC61105.2024.00052