ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction
In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenge...
Saved in:
Published in | 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) pp. 320 - 325 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
02.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenges by transforming high-dimensional data into a lower-dimensional representation while preserving maximum variance. However, PCA faces limitations in high-dimensional contexts, potentially leading to information loss and increased computational demands, particularly for sizable datasets, as PCA uses the entire dataset in the transformation process. In this paper, we propose a novel framework ActivePCA that integrates PCA and Active Machine Learning (AML) to leverage a subset of datasets in the dimension reduction process. The framework selectively identifies most informative instances from the dataset in the first step. In the second step, ActivePCA applies PCA on the selected subset of the dataset only. To demonstrate effectiveness, we applied our proposed framework to six different EHR datasets with varying dimensions. The framework significantly reduces both the number of observations and dimensions of datasets utilizing AML and PCA, respectively, resulting in improved performance from ML classifiers. ActivePCA approximately reduces 50% to 80% labeling cost on the EHR datasets compared to the original dimensions of the datasets. In addition, ActivePCA achieves significantly higher accuracy using the reduced dimensions, showing the effectiveness of AML while applying PCA. |
---|---|
AbstractList | In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenges by transforming high-dimensional data into a lower-dimensional representation while preserving maximum variance. However, PCA faces limitations in high-dimensional contexts, potentially leading to information loss and increased computational demands, particularly for sizable datasets, as PCA uses the entire dataset in the transformation process. In this paper, we propose a novel framework ActivePCA that integrates PCA and Active Machine Learning (AML) to leverage a subset of datasets in the dimension reduction process. The framework selectively identifies most informative instances from the dataset in the first step. In the second step, ActivePCA applies PCA on the selected subset of the dataset only. To demonstrate effectiveness, we applied our proposed framework to six different EHR datasets with varying dimensions. The framework significantly reduces both the number of observations and dimensions of datasets utilizing AML and PCA, respectively, resulting in improved performance from ML classifiers. ActivePCA approximately reduces 50% to 80% labeling cost on the EHR datasets compared to the original dimensions of the datasets. In addition, ActivePCA achieves significantly higher accuracy using the reduced dimensions, showing the effectiveness of AML while applying PCA. |
Author | Chowdhurv, Mohammed Shahriar, Hossain Mamudu, Lohuwa Masum, Mohammad Kosaraiu, Sai Chandra Bhyregowda, Priyanka |
Author_xml | – sequence: 1 givenname: Priyanka surname: Bhyregowda fullname: Bhyregowda, Priyanka email: priyanka.bhyregowda@sjsuo.edu organization: San Jose State University – sequence: 2 givenname: Mohammad surname: Masum fullname: Masum, Mohammad email: mohammad.masum@sjsu.edu organization: San Jose State University – sequence: 3 givenname: Lohuwa surname: Mamudu fullname: Mamudu, Lohuwa email: lohuwam@fullerton.edu organization: California State University,Fullerton – sequence: 4 givenname: Mohammed surname: Chowdhurv fullname: Chowdhurv, Mohammed email: m-chowdhury@wiu.edu organization: Western Illinois University – sequence: 5 givenname: Sai Chandra surname: Kosaraiu fullname: Kosaraiu, Sai Chandra email: sai.kosaraju@unlv.edu organization: University of Nevada,Las Vegas – sequence: 6 givenname: Hossain surname: Shahriar fullname: Shahriar, Hossain email: hshariar@uwf.edu organization: University of West Florida,Pensacola,USA |
BookMark | eNqFi9FOwjAUhqvRRFDewMTzAoyzlnard8uESCJK1HvSjDMsslPTTYhv74zee_Xnz_d9Q3HGgUmImxSTNEU7KZ-Wq5eiNP3RiUQ5TRBRyxMxspnNlUaVGWvyUzGQuTJjlVl9IYZtu0NUJtdyIHZF1fkDrcriFgp4DAfawzy6ho4hvsOCO9pG13neQq-A4w38BrB01ZtnggdykX94HSLM6tpXnriDO98Qtz4wPNPms08CX4nz2u1bGv3tpbiez17L-7EnovVH9I2LX-sUjVLKTNU_-Bs1b0xe |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/COMPSAC61105.2024.00052 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9798350376968 |
EISSN | 2836-3795 |
EndPage | 325 |
ExternalDocumentID | 10633364 |
Genre | orig-research |
GroupedDBID | 6IE 6IH ALMA_UNASSIGNED_HOLDINGS CBEJK RIE RIO |
ID | FETCH-ieee_primary_106333643 |
IEDL.DBID | RIE |
IngestDate | Wed Sep 04 05:53:21 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-ieee_primary_106333643 |
ParticipantIDs | ieee_primary_10633364 |
PublicationCentury | 2000 |
PublicationDate | 2024-July-2 |
PublicationDateYYYYMMDD | 2024-07-02 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-July-2 day: 02 |
PublicationDecade | 2020 |
PublicationTitle | 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) |
PublicationTitleAbbrev | COMPSAC |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0036852 |
Score | 3.860428 |
Snippet | In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 320 |
SubjectTerms | Active Machine Learning Computational modeling Costs Data analysis Dimension Reduction Dimensionality reduction Electronic Health Records Datasets Labeling PCA Reduce Labeling Cost Resource management Software |
Title | ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction |
URI | https://ieeexplore.ieee.org/document/10633364 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sT57qI-Kjyhy8tqabR1NvITZUIbX4gN5KdjMRVFKRxIO_3p1NoiIK3rIh2R0Ydmd29_u-ATgVjqtUkDNjOdcbFJ2RD1KZ-gM1zpSX2mPyHWYjJ3N_du9eLb1lQ1Y3XBgiMuAzGvKjucvP1qriozI9w33HcXy3A53AFjVZq112WUhdNACukT05i66TxW0Y-brh6W2gYJFsm-lF34qomBgS92Dejl5DR56GVSmH6v2HMOO_zdsC64uuh4vPQLQNG1TsQK-t14DN9N2Fx9AsbosoPMcQ5-s3esa4BWfhZSMcobtA_QmmRYb1D5gYxCVhI8b6gDrTxakRn9AW4QVXCOBTN7xhIVh2tQX9eHoXzQZs_-qllrRYtaY7e9At1gXtA9rk5eR6rs4vhJtKkpNRHshcCZmpQEg6AOvXLg7_eH8Em-wGA3UVfeiWrxUd64BeyhPjyA-PJaT- |
link.rule.ids | 310,311,786,790,795,796,802,27956,55107 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4oHvSED4wP1Dl4Bcv2AXhrKgSUVqKYcCPd7ZRETTEGPPjr3dm2aowm3rpNuzvJZHdmd7_vG4BzYTtKdVJmLKd6g6Iz8kYsY6-h2olyY6tNns1s5DDyBg_O9dSdFmR1w4UhIgM-oyY_mrv8ZKFWfFSmZ7hn27bnrMOGDvRWO6drlQsvS6mLAsLVsroXwW04vvcDTzdcvREULJNtMcHoWxkVE0X6VYjK8XPwyFNztZRN9f5DmvHfBm5D7Yuwh-PPULQDa5TtQrWs2IDFBN6DR98sb-PAv0Qfo8UbPWO_hGfhsJCO0F2g_gTjLMH8BwwN5pKwkGOdo851sWfkJ7RFeMU1AvjcDe9YCpadXYN6vzcJBg22f_aSi1rMStPtfahki4wOAC1yU3JcR2cYwoklyW4r7chUCZmojpB0CLVfuzj64_0ZbA4m4Wg2GkY3x7DFLjHAV1GHyvJ1RSc6vC_lqXHqB2thqFI |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+48th+Annual+Computers%2C+Software%2C+and+Applications+Conference+%28COMPSAC%29&rft.atitle=ActivePCA%3A+A+Novel+Framework+Integrating+PCA+and+Active+Machine+Learning+for+Efficient+Dimension+Reduction&rft.au=Bhyregowda%2C+Priyanka&rft.au=Masum%2C+Mohammad&rft.au=Mamudu%2C+Lohuwa&rft.au=Chowdhurv%2C+Mohammed&rft.date=2024-07-02&rft.pub=IEEE&rft.eissn=2836-3795&rft.spage=320&rft.epage=325&rft_id=info:doi/10.1109%2FCOMPSAC61105.2024.00052&rft.externalDocID=10633364 |