ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction
In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenge...
Saved in:
Published in | 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) pp. 320 - 325 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
02.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenges by transforming high-dimensional data into a lower-dimensional representation while preserving maximum variance. However, PCA faces limitations in high-dimensional contexts, potentially leading to information loss and increased computational demands, particularly for sizable datasets, as PCA uses the entire dataset in the transformation process. In this paper, we propose a novel framework ActivePCA that integrates PCA and Active Machine Learning (AML) to leverage a subset of datasets in the dimension reduction process. The framework selectively identifies most informative instances from the dataset in the first step. In the second step, ActivePCA applies PCA on the selected subset of the dataset only. To demonstrate effectiveness, we applied our proposed framework to six different EHR datasets with varying dimensions. The framework significantly reduces both the number of observations and dimensions of datasets utilizing AML and PCA, respectively, resulting in improved performance from ML classifiers. ActivePCA approximately reduces 50% to 80% labeling cost on the EHR datasets compared to the original dimensions of the datasets. In addition, ActivePCA achieves significantly higher accuracy using the reduced dimensions, showing the effectiveness of AML while applying PCA. |
---|---|
ISSN: | 2836-3795 |
DOI: | 10.1109/COMPSAC61105.2024.00052 |