ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction

In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenge...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC) pp. 320 - 325
Main Authors Bhyregowda, Priyanka, Masum, Mohammad, Mamudu, Lohuwa, Chowdhurv, Mohammed, Kosaraiu, Sai Chandra, Shahriar, Hossain
Format Conference Proceeding
LanguageEnglish
Published IEEE 02.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenges by transforming high-dimensional data into a lower-dimensional representation while preserving maximum variance. However, PCA faces limitations in high-dimensional contexts, potentially leading to information loss and increased computational demands, particularly for sizable datasets, as PCA uses the entire dataset in the transformation process. In this paper, we propose a novel framework ActivePCA that integrates PCA and Active Machine Learning (AML) to leverage a subset of datasets in the dimension reduction process. The framework selectively identifies most informative instances from the dataset in the first step. In the second step, ActivePCA applies PCA on the selected subset of the dataset only. To demonstrate effectiveness, we applied our proposed framework to six different EHR datasets with varying dimensions. The framework significantly reduces both the number of observations and dimensions of datasets utilizing AML and PCA, respectively, resulting in improved performance from ML classifiers. ActivePCA approximately reduces 50% to 80% labeling cost on the EHR datasets compared to the original dimensions of the datasets. In addition, ActivePCA achieves significantly higher accuracy using the reduced dimensions, showing the effectiveness of AML while applying PCA.
AbstractList In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource utilization, and model interpretability. Principal Component Analysis (PCA), a prevalent dimension reduction technique, aims to tackle these challenges by transforming high-dimensional data into a lower-dimensional representation while preserving maximum variance. However, PCA faces limitations in high-dimensional contexts, potentially leading to information loss and increased computational demands, particularly for sizable datasets, as PCA uses the entire dataset in the transformation process. In this paper, we propose a novel framework ActivePCA that integrates PCA and Active Machine Learning (AML) to leverage a subset of datasets in the dimension reduction process. The framework selectively identifies most informative instances from the dataset in the first step. In the second step, ActivePCA applies PCA on the selected subset of the dataset only. To demonstrate effectiveness, we applied our proposed framework to six different EHR datasets with varying dimensions. The framework significantly reduces both the number of observations and dimensions of datasets utilizing AML and PCA, respectively, resulting in improved performance from ML classifiers. ActivePCA approximately reduces 50% to 80% labeling cost on the EHR datasets compared to the original dimensions of the datasets. In addition, ActivePCA achieves significantly higher accuracy using the reduced dimensions, showing the effectiveness of AML while applying PCA.
Author Chowdhurv, Mohammed
Shahriar, Hossain
Mamudu, Lohuwa
Masum, Mohammad
Kosaraiu, Sai Chandra
Bhyregowda, Priyanka
Author_xml – sequence: 1
  givenname: Priyanka
  surname: Bhyregowda
  fullname: Bhyregowda, Priyanka
  email: priyanka.bhyregowda@sjsuo.edu
  organization: San Jose State University
– sequence: 2
  givenname: Mohammad
  surname: Masum
  fullname: Masum, Mohammad
  email: mohammad.masum@sjsu.edu
  organization: San Jose State University
– sequence: 3
  givenname: Lohuwa
  surname: Mamudu
  fullname: Mamudu, Lohuwa
  email: lohuwam@fullerton.edu
  organization: California State University,Fullerton
– sequence: 4
  givenname: Mohammed
  surname: Chowdhurv
  fullname: Chowdhurv, Mohammed
  email: m-chowdhury@wiu.edu
  organization: Western Illinois University
– sequence: 5
  givenname: Sai Chandra
  surname: Kosaraiu
  fullname: Kosaraiu, Sai Chandra
  email: sai.kosaraju@unlv.edu
  organization: University of Nevada,Las Vegas
– sequence: 6
  givenname: Hossain
  surname: Shahriar
  fullname: Shahriar, Hossain
  email: hshariar@uwf.edu
  organization: University of West Florida,Pensacola,USA
BookMark eNqFi9FOwjAUhqvRRFDewMTzAoyzlnard8uESCJK1HvSjDMsslPTTYhv74zee_Xnz_d9Q3HGgUmImxSTNEU7KZ-Wq5eiNP3RiUQ5TRBRyxMxspnNlUaVGWvyUzGQuTJjlVl9IYZtu0NUJtdyIHZF1fkDrcriFgp4DAfawzy6ho4hvsOCO9pG13neQq-A4w38BrB01ZtnggdykX94HSLM6tpXnriDO98Qtz4wPNPms08CX4nz2u1bGv3tpbiez17L-7EnovVH9I2LX-sUjVLKTNU_-Bs1b0xe
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/COMPSAC61105.2024.00052
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350376968
EISSN 2836-3795
EndPage 325
ExternalDocumentID 10633364
Genre orig-research
GroupedDBID 6IE
6IH
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-ieee_primary_106333643
IEDL.DBID RIE
IngestDate Wed Sep 04 05:53:21 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_106333643
ParticipantIDs ieee_primary_10633364
PublicationCentury 2000
PublicationDate 2024-July-2
PublicationDateYYYYMMDD 2024-07-02
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-2
  day: 02
PublicationDecade 2020
PublicationTitle 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)
PublicationTitleAbbrev COMPSAC
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0036852
Score 3.860428
Snippet In medical data analysis, addressing challenges from high-dimensional datasets is crucial due to issues related to computational complexity, resource...
SourceID ieee
SourceType Publisher
StartPage 320
SubjectTerms Active Machine Learning
Computational modeling
Costs
Data analysis
Dimension Reduction
Dimensionality reduction
Electronic Health Records Datasets
Labeling
PCA
Reduce Labeling Cost
Resource management
Software
Title ActivePCA: A Novel Framework Integrating PCA and Active Machine Learning for Efficient Dimension Reduction
URI https://ieeexplore.ieee.org/document/10633364
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB5sT57qI-Kjyhy8tqabR1NvITZUIbX4gN5KdjMRVFKRxIO_3p1NoiIK3rIh2R0Ydmd29_u-ATgVjqtUkDNjOdcbFJ2RD1KZ-gM1zpSX2mPyHWYjJ3N_du9eLb1lQ1Y3XBgiMuAzGvKjucvP1qriozI9w33HcXy3A53AFjVZq112WUhdNACukT05i66TxW0Y-brh6W2gYJFsm-lF34qomBgS92Dejl5DR56GVSmH6v2HMOO_zdsC64uuh4vPQLQNG1TsQK-t14DN9N2Fx9AsbosoPMcQ5-s3esa4BWfhZSMcobtA_QmmRYb1D5gYxCVhI8b6gDrTxakRn9AW4QVXCOBTN7xhIVh2tQX9eHoXzQZs_-qllrRYtaY7e9At1gXtA9rk5eR6rs4vhJtKkpNRHshcCZmpQEg6AOvXLg7_eH8Em-wGA3UVfeiWrxUd64BeyhPjyA-PJaT-
link.rule.ids 310,311,786,790,795,796,802,27956,55107
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4oHvSED4wP1Dl4Bcv2AXhrKgSUVqKYcCPd7ZRETTEGPPjr3dm2aowm3rpNuzvJZHdmd7_vG4BzYTtKdVJmLKd6g6Iz8kYsY6-h2olyY6tNns1s5DDyBg_O9dSdFmR1w4UhIgM-oyY_mrv8ZKFWfFSmZ7hn27bnrMOGDvRWO6drlQsvS6mLAsLVsroXwW04vvcDTzdcvREULJNtMcHoWxkVE0X6VYjK8XPwyFNztZRN9f5DmvHfBm5D7Yuwh-PPULQDa5TtQrWs2IDFBN6DR98sb-PAv0Qfo8UbPWO_hGfhsJCO0F2g_gTjLMH8BwwN5pKwkGOdo851sWfkJ7RFeMU1AvjcDe9YCpadXYN6vzcJBg22f_aSi1rMStPtfahki4wOAC1yU3JcR2cYwoklyW4r7chUCZmojpB0CLVfuzj64_0ZbA4m4Wg2GkY3x7DFLjHAV1GHyvJ1RSc6vC_lqXHqB2thqFI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+48th+Annual+Computers%2C+Software%2C+and+Applications+Conference+%28COMPSAC%29&rft.atitle=ActivePCA%3A+A+Novel+Framework+Integrating+PCA+and+Active+Machine+Learning+for+Efficient+Dimension+Reduction&rft.au=Bhyregowda%2C+Priyanka&rft.au=Masum%2C+Mohammad&rft.au=Mamudu%2C+Lohuwa&rft.au=Chowdhurv%2C+Mohammed&rft.date=2024-07-02&rft.pub=IEEE&rft.eissn=2836-3795&rft.spage=320&rft.epage=325&rft_id=info:doi/10.1109%2FCOMPSAC61105.2024.00052&rft.externalDocID=10633364