Consensus Clustering-Based Undersampling Approach to Imbalanced Learning

Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great i...

Full description

Saved in:
Bibliographic Details
Published inScientific programming Vol. 2019; no. 2019; pp. 1 - 14
Main Author Onan, Aytuğ
Format Journal Article
LanguageEnglish
Published Cairo, Egypt Hindawi Publishing Corporation 01.01.2019
Hindawi
John Wiley & Sons, Inc
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
AbstractList Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k -means, k -modes, k -means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k -nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
Author Onan, Aytuğ
Author_xml – sequence: 1
  fullname: Onan, Aytuğ
BookMark eNqFkE1Lw0AQhhdRsK3ePEvAo8bu7EeSPdagtlDwYsFbmG42NiXd1N0E8d-7JQVBEJnDDDPPzDu8Y3JqW2sIuQJ6DyDllFFQU6ko0Cw9ISPIUhkrUG-noaYyixUT4pyMvd9SChlQOiLzvLXeWN_7KG963xlX2_f4Ab0po5UtjfO42zehF832e9ei3kRdGy12a2zQ6gAtDTob5hfkrMLGm8tjnpDV0-NrPo-XL8-LfLaMNU9oFyeSI2XIUUoqKm3WDBLIRMIwzYzQCVYq06xUmEqdhUgTnSLyCpUEU0nBJ-RmuBu--eiN74pt2zsbJAsGigPngrFA3Q2Udq33zlTF3tU7dF8F0OLgVXHwqjh6FXD2C9d1h13d2s5h3fy1dDssbWpb4mf9n8T1QJvAmAp_aAY0FZx_Azx-hEs
CitedBy_id crossref_primary_10_1080_02664763_2024_2307535
crossref_primary_10_1108_AGJSR_04_2022_0029
crossref_primary_10_1080_1331677X_2022_2086600
crossref_primary_10_1109_TCBBIO_2024_3494599
crossref_primary_10_1002_cpe_6930
crossref_primary_10_1007_s10115_023_01985_5
crossref_primary_10_1108_DTA_07_2021_0177
crossref_primary_10_1109_TNNLS_2023_3270559
crossref_primary_10_1186_s40537_025_01119_4
crossref_primary_10_1080_21681163_2022_2103451
crossref_primary_10_3233_JIFS_213520
crossref_primary_10_1080_21681163_2023_2189487
crossref_primary_10_1145_3703461
crossref_primary_10_1080_02533839_2024_2407295
crossref_primary_10_1080_21681163_2022_2063189
crossref_primary_10_1080_10106049_2022_2072005
crossref_primary_10_1109_TAI_2023_3298328
crossref_primary_10_1002_isaf_1471
crossref_primary_10_2174_2352096516666230105143052
crossref_primary_10_1080_03772063_2023_2255572
crossref_primary_10_1111_coin_12692
crossref_primary_10_1017_dap_2024_25
crossref_primary_10_4258_hir_2020_26_4_284
crossref_primary_10_1155_2021_7648856
crossref_primary_10_1080_03610918_2022_2094962
crossref_primary_10_1155_2021_6647557
crossref_primary_10_1002_cpe_7613
crossref_primary_10_1016_j_iswa_2022_200173
crossref_primary_10_1002_cpe_7291
crossref_primary_10_1080_03610926_2023_2268767
crossref_primary_10_3846_aviation_2023_19739
crossref_primary_10_1049_cit2_12128
crossref_primary_10_1111_coin_12563
crossref_primary_10_1080_10255842_2023_2263125
crossref_primary_10_1002_cpe_7001
crossref_primary_10_1080_21681163_2023_2243351
crossref_primary_10_1109_TAFFC_2023_3288407
crossref_primary_10_1109_TCE_2023_3345390
crossref_primary_10_1109_ACCESS_2025_3530417
crossref_primary_10_1155_2021_7194728
crossref_primary_10_4218_etrij_2022_0271
crossref_primary_10_1109_TAI_2023_3296685
crossref_primary_10_1080_0951192X_2023_2177734
crossref_primary_10_1109_ACCESS_2019_2945911
crossref_primary_10_1080_0954898X_2025_2453032
crossref_primary_10_1016_j_bspc_2021_102709
crossref_primary_10_1111_exsy_13608
crossref_primary_10_1007_s10115_023_01998_0
crossref_primary_10_1080_21681163_2023_2227735
crossref_primary_10_4218_etrij_2022_0281
crossref_primary_10_1021_acs_jproteome_2c00488
crossref_primary_10_1080_21681163_2023_2177821
crossref_primary_10_1109_JESTIE_2024_3358729
crossref_primary_10_1049_cit2_12348
crossref_primary_10_1016_j_artmed_2022_102257
crossref_primary_10_1080_16583655_2022_2143627
crossref_primary_10_1080_03772063_2023_2264251
crossref_primary_10_1002_ima_23025
crossref_primary_10_1007_s12539_023_00571_1
crossref_primary_10_1109_TNNLS_2023_3242049
crossref_primary_10_1145_3569899
crossref_primary_10_1109_TAI_2022_3225124
crossref_primary_10_1109_ACCESS_2019_2961784
crossref_primary_10_47836_pjst_31_5_27
crossref_primary_10_1109_TEVC_2022_3209544
crossref_primary_10_1080_21681163_2023_2243347
crossref_primary_10_1155_2022_8534739
crossref_primary_10_1002_adc2_196
crossref_primary_10_47836_pjst_32_1_05
crossref_primary_10_1080_21681163_2023_2234054
crossref_primary_10_1109_TETCI_2022_3221129
crossref_primary_10_1080_10255842_2022_2129969
crossref_primary_10_1080_21681163_2023_2245927
crossref_primary_10_1002_cae_22737
crossref_primary_10_1093_comjnl_bxac144
crossref_primary_10_26599_TST_2023_9010006
crossref_primary_10_1080_01969722_2023_2166259
crossref_primary_10_1016_j_eswa_2022_119028
crossref_primary_10_1080_23270012_2024_2348475
crossref_primary_10_9734_ajrcos_2024_v17i7487
crossref_primary_10_1109_TNNLS_2023_3258464
crossref_primary_10_1109_ACCESS_2024_3416321
crossref_primary_10_1080_10255842_2022_2078966
crossref_primary_10_31590_ejosat_1082451
crossref_primary_10_1109_TNNLS_2023_3349142
crossref_primary_10_1002_cpe_7683
crossref_primary_10_1109_TNNLS_2022_3185961
crossref_primary_10_1080_23737484_2023_2278112
crossref_primary_10_3233_JIFS_221165
crossref_primary_10_1049_cps2_12079
crossref_primary_10_1002_cpe_8097
crossref_primary_10_1017_S1351324923000438
crossref_primary_10_1080_03610918_2023_2196384
crossref_primary_10_1016_j_ins_2019_11_004
crossref_primary_10_1007_s10462_023_10433_3
crossref_primary_10_1515_omgc_2022_0042
crossref_primary_10_1155_2021_9947621
crossref_primary_10_32890_jict2021_20_3_6
crossref_primary_10_1109_TAI_2022_3224416
crossref_primary_10_1049_cdt2_12061
crossref_primary_10_1049_cit2_12374
crossref_primary_10_13005_bpj_3039
crossref_primary_10_1109_ACCESS_2023_3334272
crossref_primary_10_4218_etrij_2023_0162
crossref_primary_10_1080_10255842_2023_2270101
crossref_primary_10_1002_mde_4072
crossref_primary_10_1080_08839514_2022_2123094
crossref_primary_10_1080_10255842_2022_2081504
crossref_primary_10_1016_j_eswa_2022_118276
crossref_primary_10_1080_08874417_2022_2155267
crossref_primary_10_1080_0952813X_2023_2165715
crossref_primary_10_1109_THMS_2023_3319290
crossref_primary_10_1080_09540091_2023_2184310
crossref_primary_10_1080_00128775_2024_2367434
crossref_primary_10_2174_2352096516666221103102058
crossref_primary_10_1016_j_iswa_2022_200117
crossref_primary_10_1080_10255842_2023_2181660
Cites_doi 10.1016/j.ins.2013.07.007
10.1007/s00521-012-1041-z
10.1007/s13748-011-0008-0
10.1016/j.ins.2017.05.008
10.1145/1007730.1007735
10.1093/biostatistics/kxj007
10.1109/tpami.2007.1138
10.1016/j.ins.2018.10.029
10.1016/j.cosrev.2018.01.003
10.1016/j.patcog.2014.11.014
10.1109/TSM.2015.2445380
10.1016/j.patrec.2015.05.008
10.1016/j.patcog.2014.10.032
10.1007/s10916-018-1154-8
10.1016/j.patcog.2010.03.006
10.1109/tsmcc.2011.2161285
10.1613/jair.953
10.1016/j.eswa.2014.08.025
10.1007/s00726-010-0595-2
10.1016/j.eswa.2016.12.035
10.14419/ijet.v7i1.8.9984
10.1016/j.ins.2018.06.056
10.1007/s10618-008-0087-0
10.1109/tkde.2012.232
10.1007/s10044-003-0192-z
10.1145/1007730.1007734
10.1016/j.neucom.2014.07.064
10.1007/s10044-015-0458-2
10.1007/s13748-014-0045-6
10.1109/tsmca.2009.2029559
10.1504/ijcse.2019.096987
ContentType Journal Article
Copyright Copyright © 2019 Aytuğ Onan.
Copyright © 2019 Aytuğ Onan. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: Copyright © 2019 Aytuğ Onan.
– notice: Copyright © 2019 Aytuğ Onan. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0
DBID ADJCN
AHFXO
RHU
RHW
RHX
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1155/2019/5901087
DatabaseName الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals
معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete
Hindawi Publishing Complete
Hindawi Publishing Subscription Journals
Hindawi Publishing Open Access
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database

CrossRef
Database_xml – sequence: 1
  dbid: RHX
  name: Hindawi Publishing Open Access
  url: http://www.hindawi.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1875-919X
Editor García-Díaz, Vicente
Editor_xml – sequence: 1
  givenname: Vicente
  surname: García-Díaz
  fullname: García-Díaz, Vicente
EndPage 14
ExternalDocumentID 10_1155_2019_5901087
1210743
GroupedDBID .4S
.DC
0R~
24P
4.4
5VS
AAFNC
AAFWJ
AAJEY
ABDBF
ABEFU
ABJNI
ABUBZ
ACGFS
ACPQW
ADBBV
ADJCN
ADZMO
AENEX
AFRHK
AGIAB
AHFXO
ALMA_UNASSIGNED_HOLDINGS
ARCSS
ASPBG
AVWKF
BCNDV
CAG
COF
DU5
EAD
EAP
EBS
EDO
EJD
EMK
EPL
EST
ESX
FEDTE
GROUPED_DOAJ
H13
HZ~
I-F
IAO
IHR
IL9
IOS
IPNFZ
KQ8
MET
MIO
MK~
ML~
MV1
NGNOM
O9-
OK1
RHX
RIG
TUS
VOH
RHU
RHW
AAYXX
ACCMX
CITATION
7SC
7SP
8FD
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c360t-653a02a3a5504fceb21618462a78e4c6af98c2d9a75c8c8c76c7aa3fa951ef543
IEDL.DBID RHX
ISSN 1058-9244
IngestDate Fri Jul 25 09:33:49 EDT 2025
Tue Jul 01 02:50:03 EDT 2025
Thu Apr 24 23:03:43 EDT 2025
Sun Jun 02 19:16:56 EDT 2024
Tue Nov 26 17:10:17 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2019
Language English
License This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c360t-653a02a3a5504fceb21618462a78e4c6af98c2d9a75c8c8c76c7aa3fa951ef543
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-9434-5880
OpenAccessLink https://dx.doi.org/10.1155/2019/5901087
PQID 2193133422
PQPubID 2046410
PageCount 14
ParticipantIDs proquest_journals_2193133422
crossref_primary_10_1155_2019_5901087
crossref_citationtrail_10_1155_2019_5901087
hindawi_primary_10_1155_2019_5901087
emarefa_primary_1210743
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-01-01
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – month: 01
  year: 2019
  text: 2019-01-01
  day: 01
PublicationDecade 2010
PublicationPlace Cairo, Egypt
PublicationPlace_xml – name: Cairo, Egypt
– name: New York
PublicationTitle Scientific programming
PublicationYear 2019
Publisher Hindawi Publishing Corporation
Hindawi
John Wiley & Sons, Inc
Publisher_xml – name: Hindawi Publishing Corporation
– name: Hindawi
– name: John Wiley & Sons, Inc
References 22
44
45
24
25
48
27
(32) 2001
(26) 1995; 10
10
11
33
12
34
35
14
36
15
37
16
38
18
(30) 1997; 3
1
2
3
4
(23) 2015; 28
9
40
41
42
21
43
References_xml – ident: 2
  doi: 10.1016/j.ins.2013.07.007
– ident: 36
  doi: 10.1007/s00521-012-1041-z
– volume: 3
  start-page: 34
  issue: 8
  year: 1997
  ident: 30
  publication-title: DMKD
– ident: 11
  doi: 10.1007/s13748-011-0008-0
– ident: 24
  doi: 10.1016/j.ins.2017.05.008
– ident: 14
  doi: 10.1145/1007730.1007735
– volume: 10
  volume-title: No free lunch theorems for search
  year: 1995
  ident: 26
– ident: 33
  doi: 10.1093/biostatistics/kxj007
– ident: 48
  doi: 10.1109/tpami.2007.1138
– ident: 44
  doi: 10.1016/j.ins.2018.10.029
– ident: 45
  doi: 10.1016/j.cosrev.2018.01.003
– ident: 22
  doi: 10.1016/j.patcog.2014.11.014
– volume: 28
  start-page: 318
  issue: 3
  year: 2015
  ident: 23
  publication-title: IEEE Transactions on Semiconductor Manufacturing
  doi: 10.1109/TSM.2015.2445380
– ident: 38
  doi: 10.1016/j.patrec.2015.05.008
– ident: 4
  doi: 10.1016/j.patcog.2014.10.032
– ident: 43
  doi: 10.1007/s10916-018-1154-8
– ident: 9
  doi: 10.1016/j.patcog.2010.03.006
– ident: 12
  doi: 10.1109/tsmcc.2011.2161285
– ident: 16
  doi: 10.1613/jair.953
– ident: 10
  doi: 10.1016/j.eswa.2014.08.025
– ident: 35
  doi: 10.1007/s00726-010-0595-2
– year: 2001
  ident: 32
– ident: 1
  doi: 10.1016/j.eswa.2016.12.035
– ident: 40
  doi: 10.14419/ijet.v7i1.8.9984
– ident: 42
  doi: 10.1016/j.ins.2018.06.056
– ident: 15
  doi: 10.1007/s10618-008-0087-0
– ident: 34
  doi: 10.1109/tkde.2012.232
– ident: 18
  doi: 10.1007/s10044-003-0192-z
– ident: 3
  doi: 10.1145/1007730.1007734
– ident: 21
  doi: 10.1016/j.neucom.2014.07.064
– ident: 25
  doi: 10.1007/s10044-015-0458-2
– ident: 37
  doi: 10.1007/s13748-014-0045-6
– ident: 27
  doi: 10.1109/tsmca.2009.2029559
– ident: 41
  doi: 10.1504/ijcse.2019.096987
SSID ssj0018100
Score 2.5180027
Snippet Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small...
SourceID proquest
crossref
hindawi
emarefa
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithms
Artificial intelligence
Bankruptcy
Bayesian analysis
Classification
Clustering
Data mining
Datasets
Empirical analysis
Identification
Information science
Learning
Machine learning
Malware
Medical diagnosis
Performance prediction
Regression analysis
Self organizing maps
Support vector machines
Teaching methods
Title Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
URI https://search.emarefa.net/detail/BIM-1210743
https://dx.doi.org/10.1155/2019/5901087
https://www.proquest.com/docview/2193133422
Volume 2019
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uMPDi98d0jh7mSYJtPpr2OIejCooHB7uVNE2nsHViO_z3fWnTiQ5RemnhNYf3kvzyS15-D6FBSDPKlZtgl0iBmfRSLGXiYpERLXRq8tfM1sDDox9N2P2UT61IUrF5hA9oB_TcC6_NFUk3EC3Ugg5mSHk0XR8WBJ5biw5wGLsAV01--49_vyFPRy8kvAAcdV4M-f143ZiMK4QZ76EduzR0hnUs99GWzg_QblN2wbGj8BBFpsimqVBROKP5yggdAPzgG4Cj1KnKGBXSpInnM2doBcOdcuncLRKTxajAyGqqzo7QZHz7PIqwLYiAFfXdEvucSvAplUArWKaAFBu5e-aDmwPNlC-zMFAkDaXgKoBH-EpISTMJyyidcUaPUTtf5voUOQkVXmoqDXupyxjRga9ZEnCIEFVKc91FV42zYmXVwk3RinlcsQbOY-Pa2Lq2iy7X1m-1SsYvdifW719mxKSE0i4a2Dj80UCvCVJsB1sRw6RLgWozQs7-18o52jaf9U5KD7XL95W-gLVFmfRRi7CnftW_PgHWYMTo
linkProvider Hindawi Publishing
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Consensus+Clustering-Based+Undersampling+Approach+to+Imbalanced+Learning&rft.jtitle=Scientific+programming&rft.au=Onan%2C+Aytu%C4%9F&rft.date=2019-01-01&rft.issn=1058-9244&rft.eissn=1875-919X&rft.volume=2019&rft.spage=1&rft.epage=14&rft_id=info:doi/10.1155%2F2019%2F5901087&rft.externalDBID=n%2Fa&rft.externalDocID=10_1155_2019_5901087
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-9244&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-9244&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-9244&client=summon