Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great i...
Saved in:
Published in | Scientific programming Vol. 2019; no. 2019; pp. 1 - 14 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Cairo, Egypt
Hindawi Publishing Corporation
01.01.2019
Hindawi John Wiley & Sons, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance. |
---|---|
AbstractList | Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance. Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k -means, k -modes, k -means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naïve Bayes, logistic regression, support vector machines, random forests, and k -nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance. |
Author | Onan, Aytuğ |
Author_xml | – sequence: 1 fullname: Onan, Aytuğ |
BookMark | eNqFkE1Lw0AQhhdRsK3ePEvAo8bu7EeSPdagtlDwYsFbmG42NiXd1N0E8d-7JQVBEJnDDDPPzDu8Y3JqW2sIuQJ6DyDllFFQU6ko0Cw9ISPIUhkrUG-noaYyixUT4pyMvd9SChlQOiLzvLXeWN_7KG963xlX2_f4Ab0po5UtjfO42zehF832e9ei3kRdGy12a2zQ6gAtDTob5hfkrMLGm8tjnpDV0-NrPo-XL8-LfLaMNU9oFyeSI2XIUUoqKm3WDBLIRMIwzYzQCVYq06xUmEqdhUgTnSLyCpUEU0nBJ-RmuBu--eiN74pt2zsbJAsGigPngrFA3Q2Udq33zlTF3tU7dF8F0OLgVXHwqjh6FXD2C9d1h13d2s5h3fy1dDssbWpb4mf9n8T1QJvAmAp_aAY0FZx_Azx-hEs |
CitedBy_id | crossref_primary_10_1080_02664763_2024_2307535 crossref_primary_10_1108_AGJSR_04_2022_0029 crossref_primary_10_1080_1331677X_2022_2086600 crossref_primary_10_1109_TCBBIO_2024_3494599 crossref_primary_10_1002_cpe_6930 crossref_primary_10_1007_s10115_023_01985_5 crossref_primary_10_1108_DTA_07_2021_0177 crossref_primary_10_1109_TNNLS_2023_3270559 crossref_primary_10_1186_s40537_025_01119_4 crossref_primary_10_1080_21681163_2022_2103451 crossref_primary_10_3233_JIFS_213520 crossref_primary_10_1080_21681163_2023_2189487 crossref_primary_10_1145_3703461 crossref_primary_10_1080_02533839_2024_2407295 crossref_primary_10_1080_21681163_2022_2063189 crossref_primary_10_1080_10106049_2022_2072005 crossref_primary_10_1109_TAI_2023_3298328 crossref_primary_10_1002_isaf_1471 crossref_primary_10_2174_2352096516666230105143052 crossref_primary_10_1080_03772063_2023_2255572 crossref_primary_10_1111_coin_12692 crossref_primary_10_1017_dap_2024_25 crossref_primary_10_4258_hir_2020_26_4_284 crossref_primary_10_1155_2021_7648856 crossref_primary_10_1080_03610918_2022_2094962 crossref_primary_10_1155_2021_6647557 crossref_primary_10_1002_cpe_7613 crossref_primary_10_1016_j_iswa_2022_200173 crossref_primary_10_1002_cpe_7291 crossref_primary_10_1080_03610926_2023_2268767 crossref_primary_10_3846_aviation_2023_19739 crossref_primary_10_1049_cit2_12128 crossref_primary_10_1111_coin_12563 crossref_primary_10_1080_10255842_2023_2263125 crossref_primary_10_1002_cpe_7001 crossref_primary_10_1080_21681163_2023_2243351 crossref_primary_10_1109_TAFFC_2023_3288407 crossref_primary_10_1109_TCE_2023_3345390 crossref_primary_10_1109_ACCESS_2025_3530417 crossref_primary_10_1155_2021_7194728 crossref_primary_10_4218_etrij_2022_0271 crossref_primary_10_1109_TAI_2023_3296685 crossref_primary_10_1080_0951192X_2023_2177734 crossref_primary_10_1109_ACCESS_2019_2945911 crossref_primary_10_1080_0954898X_2025_2453032 crossref_primary_10_1016_j_bspc_2021_102709 crossref_primary_10_1111_exsy_13608 crossref_primary_10_1007_s10115_023_01998_0 crossref_primary_10_1080_21681163_2023_2227735 crossref_primary_10_4218_etrij_2022_0281 crossref_primary_10_1021_acs_jproteome_2c00488 crossref_primary_10_1080_21681163_2023_2177821 crossref_primary_10_1109_JESTIE_2024_3358729 crossref_primary_10_1049_cit2_12348 crossref_primary_10_1016_j_artmed_2022_102257 crossref_primary_10_1080_16583655_2022_2143627 crossref_primary_10_1080_03772063_2023_2264251 crossref_primary_10_1002_ima_23025 crossref_primary_10_1007_s12539_023_00571_1 crossref_primary_10_1109_TNNLS_2023_3242049 crossref_primary_10_1145_3569899 crossref_primary_10_1109_TAI_2022_3225124 crossref_primary_10_1109_ACCESS_2019_2961784 crossref_primary_10_47836_pjst_31_5_27 crossref_primary_10_1109_TEVC_2022_3209544 crossref_primary_10_1080_21681163_2023_2243347 crossref_primary_10_1155_2022_8534739 crossref_primary_10_1002_adc2_196 crossref_primary_10_47836_pjst_32_1_05 crossref_primary_10_1080_21681163_2023_2234054 crossref_primary_10_1109_TETCI_2022_3221129 crossref_primary_10_1080_10255842_2022_2129969 crossref_primary_10_1080_21681163_2023_2245927 crossref_primary_10_1002_cae_22737 crossref_primary_10_1093_comjnl_bxac144 crossref_primary_10_26599_TST_2023_9010006 crossref_primary_10_1080_01969722_2023_2166259 crossref_primary_10_1016_j_eswa_2022_119028 crossref_primary_10_1080_23270012_2024_2348475 crossref_primary_10_9734_ajrcos_2024_v17i7487 crossref_primary_10_1109_TNNLS_2023_3258464 crossref_primary_10_1109_ACCESS_2024_3416321 crossref_primary_10_1080_10255842_2022_2078966 crossref_primary_10_31590_ejosat_1082451 crossref_primary_10_1109_TNNLS_2023_3349142 crossref_primary_10_1002_cpe_7683 crossref_primary_10_1109_TNNLS_2022_3185961 crossref_primary_10_1080_23737484_2023_2278112 crossref_primary_10_3233_JIFS_221165 crossref_primary_10_1049_cps2_12079 crossref_primary_10_1002_cpe_8097 crossref_primary_10_1017_S1351324923000438 crossref_primary_10_1080_03610918_2023_2196384 crossref_primary_10_1016_j_ins_2019_11_004 crossref_primary_10_1007_s10462_023_10433_3 crossref_primary_10_1515_omgc_2022_0042 crossref_primary_10_1155_2021_9947621 crossref_primary_10_32890_jict2021_20_3_6 crossref_primary_10_1109_TAI_2022_3224416 crossref_primary_10_1049_cdt2_12061 crossref_primary_10_1049_cit2_12374 crossref_primary_10_13005_bpj_3039 crossref_primary_10_1109_ACCESS_2023_3334272 crossref_primary_10_4218_etrij_2023_0162 crossref_primary_10_1080_10255842_2023_2270101 crossref_primary_10_1002_mde_4072 crossref_primary_10_1080_08839514_2022_2123094 crossref_primary_10_1080_10255842_2022_2081504 crossref_primary_10_1016_j_eswa_2022_118276 crossref_primary_10_1080_08874417_2022_2155267 crossref_primary_10_1080_0952813X_2023_2165715 crossref_primary_10_1109_THMS_2023_3319290 crossref_primary_10_1080_09540091_2023_2184310 crossref_primary_10_1080_00128775_2024_2367434 crossref_primary_10_2174_2352096516666221103102058 crossref_primary_10_1016_j_iswa_2022_200117 crossref_primary_10_1080_10255842_2023_2181660 |
Cites_doi | 10.1016/j.ins.2013.07.007 10.1007/s00521-012-1041-z 10.1007/s13748-011-0008-0 10.1016/j.ins.2017.05.008 10.1145/1007730.1007735 10.1093/biostatistics/kxj007 10.1109/tpami.2007.1138 10.1016/j.ins.2018.10.029 10.1016/j.cosrev.2018.01.003 10.1016/j.patcog.2014.11.014 10.1109/TSM.2015.2445380 10.1016/j.patrec.2015.05.008 10.1016/j.patcog.2014.10.032 10.1007/s10916-018-1154-8 10.1016/j.patcog.2010.03.006 10.1109/tsmcc.2011.2161285 10.1613/jair.953 10.1016/j.eswa.2014.08.025 10.1007/s00726-010-0595-2 10.1016/j.eswa.2016.12.035 10.14419/ijet.v7i1.8.9984 10.1016/j.ins.2018.06.056 10.1007/s10618-008-0087-0 10.1109/tkde.2012.232 10.1007/s10044-003-0192-z 10.1145/1007730.1007734 10.1016/j.neucom.2014.07.064 10.1007/s10044-015-0458-2 10.1007/s13748-014-0045-6 10.1109/tsmca.2009.2029559 10.1504/ijcse.2019.096987 |
ContentType | Journal Article |
Copyright | Copyright © 2019 Aytuğ Onan. Copyright © 2019 Aytuğ Onan. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: Copyright © 2019 Aytuğ Onan. – notice: Copyright © 2019 Aytuğ Onan. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0 |
DBID | ADJCN AHFXO RHU RHW RHX AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1155/2019/5901087 |
DatabaseName | الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete Hindawi Publishing Complete Hindawi Publishing Subscription Journals Hindawi Publishing Open Access CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database CrossRef |
Database_xml | – sequence: 1 dbid: RHX name: Hindawi Publishing Open Access url: http://www.hindawi.com/journals/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1875-919X |
Editor | García-Díaz, Vicente |
Editor_xml | – sequence: 1 givenname: Vicente surname: García-Díaz fullname: García-Díaz, Vicente |
EndPage | 14 |
ExternalDocumentID | 10_1155_2019_5901087 1210743 |
GroupedDBID | .4S .DC 0R~ 24P 4.4 5VS AAFNC AAFWJ AAJEY ABDBF ABEFU ABJNI ABUBZ ACGFS ACPQW ADBBV ADJCN ADZMO AENEX AFRHK AGIAB AHFXO ALMA_UNASSIGNED_HOLDINGS ARCSS ASPBG AVWKF BCNDV CAG COF DU5 EAD EAP EBS EDO EJD EMK EPL EST ESX FEDTE GROUPED_DOAJ H13 HZ~ I-F IAO IHR IL9 IOS IPNFZ KQ8 MET MIO MK~ ML~ MV1 NGNOM O9- OK1 RHX RIG TUS VOH RHU RHW AAYXX ACCMX CITATION 7SC 7SP 8FD AAMMB AEFGJ AGXDD AIDQK AIDYY JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c360t-653a02a3a5504fceb21618462a78e4c6af98c2d9a75c8c8c76c7aa3fa951ef543 |
IEDL.DBID | RHX |
ISSN | 1058-9244 |
IngestDate | Fri Jul 25 09:33:49 EDT 2025 Tue Jul 01 02:50:03 EDT 2025 Thu Apr 24 23:03:43 EDT 2025 Sun Jun 02 19:16:56 EDT 2024 Tue Nov 26 17:10:17 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2019 |
Language | English |
License | This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c360t-653a02a3a5504fceb21618462a78e4c6af98c2d9a75c8c8c76c7aa3fa951ef543 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-9434-5880 |
OpenAccessLink | https://dx.doi.org/10.1155/2019/5901087 |
PQID | 2193133422 |
PQPubID | 2046410 |
PageCount | 14 |
ParticipantIDs | proquest_journals_2193133422 crossref_primary_10_1155_2019_5901087 crossref_citationtrail_10_1155_2019_5901087 hindawi_primary_10_1155_2019_5901087 emarefa_primary_1210743 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2019-01-01 |
PublicationDateYYYYMMDD | 2019-01-01 |
PublicationDate_xml | – month: 01 year: 2019 text: 2019-01-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Cairo, Egypt |
PublicationPlace_xml | – name: Cairo, Egypt – name: New York |
PublicationTitle | Scientific programming |
PublicationYear | 2019 |
Publisher | Hindawi Publishing Corporation Hindawi John Wiley & Sons, Inc |
Publisher_xml | – name: Hindawi Publishing Corporation – name: Hindawi – name: John Wiley & Sons, Inc |
References | 22 44 45 24 25 48 27 (32) 2001 (26) 1995; 10 10 11 33 12 34 35 14 36 15 37 16 38 18 (30) 1997; 3 1 2 3 4 (23) 2015; 28 9 40 41 42 21 43 |
References_xml | – ident: 2 doi: 10.1016/j.ins.2013.07.007 – ident: 36 doi: 10.1007/s00521-012-1041-z – volume: 3 start-page: 34 issue: 8 year: 1997 ident: 30 publication-title: DMKD – ident: 11 doi: 10.1007/s13748-011-0008-0 – ident: 24 doi: 10.1016/j.ins.2017.05.008 – ident: 14 doi: 10.1145/1007730.1007735 – volume: 10 volume-title: No free lunch theorems for search year: 1995 ident: 26 – ident: 33 doi: 10.1093/biostatistics/kxj007 – ident: 48 doi: 10.1109/tpami.2007.1138 – ident: 44 doi: 10.1016/j.ins.2018.10.029 – ident: 45 doi: 10.1016/j.cosrev.2018.01.003 – ident: 22 doi: 10.1016/j.patcog.2014.11.014 – volume: 28 start-page: 318 issue: 3 year: 2015 ident: 23 publication-title: IEEE Transactions on Semiconductor Manufacturing doi: 10.1109/TSM.2015.2445380 – ident: 38 doi: 10.1016/j.patrec.2015.05.008 – ident: 4 doi: 10.1016/j.patcog.2014.10.032 – ident: 43 doi: 10.1007/s10916-018-1154-8 – ident: 9 doi: 10.1016/j.patcog.2010.03.006 – ident: 12 doi: 10.1109/tsmcc.2011.2161285 – ident: 16 doi: 10.1613/jair.953 – ident: 10 doi: 10.1016/j.eswa.2014.08.025 – ident: 35 doi: 10.1007/s00726-010-0595-2 – year: 2001 ident: 32 – ident: 1 doi: 10.1016/j.eswa.2016.12.035 – ident: 40 doi: 10.14419/ijet.v7i1.8.9984 – ident: 42 doi: 10.1016/j.ins.2018.06.056 – ident: 15 doi: 10.1007/s10618-008-0087-0 – ident: 34 doi: 10.1109/tkde.2012.232 – ident: 18 doi: 10.1007/s10044-003-0192-z – ident: 3 doi: 10.1145/1007730.1007734 – ident: 21 doi: 10.1016/j.neucom.2014.07.064 – ident: 25 doi: 10.1007/s10044-015-0458-2 – ident: 37 doi: 10.1007/s13748-014-0045-6 – ident: 27 doi: 10.1109/tsmca.2009.2029559 – ident: 41 doi: 10.1504/ijcse.2019.096987 |
SSID | ssj0018100 |
Score | 2.5180027 |
Snippet | Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small... |
SourceID | proquest crossref hindawi emarefa |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1 |
SubjectTerms | Algorithms Artificial intelligence Bankruptcy Bayesian analysis Classification Clustering Data mining Datasets Empirical analysis Identification Information science Learning Machine learning Malware Medical diagnosis Performance prediction Regression analysis Self organizing maps Support vector machines Teaching methods |
Title | Consensus Clustering-Based Undersampling Approach to Imbalanced Learning |
URI | https://search.emarefa.net/detail/BIM-1210743 https://dx.doi.org/10.1155/2019/5901087 https://www.proquest.com/docview/2193133422 |
Volume | 2019 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uMPDi98d0jh7mSYJtPpr2OIejCooHB7uVNE2nsHViO_z3fWnTiQ5RemnhNYf3kvzyS15-D6FBSDPKlZtgl0iBmfRSLGXiYpERLXRq8tfM1sDDox9N2P2UT61IUrF5hA9oB_TcC6_NFUk3EC3Ugg5mSHk0XR8WBJ5biw5wGLsAV01--49_vyFPRy8kvAAcdV4M-f143ZiMK4QZ76EduzR0hnUs99GWzg_QblN2wbGj8BBFpsimqVBROKP5yggdAPzgG4Cj1KnKGBXSpInnM2doBcOdcuncLRKTxajAyGqqzo7QZHz7PIqwLYiAFfXdEvucSvAplUArWKaAFBu5e-aDmwPNlC-zMFAkDaXgKoBH-EpISTMJyyidcUaPUTtf5voUOQkVXmoqDXupyxjRga9ZEnCIEFVKc91FV42zYmXVwk3RinlcsQbOY-Pa2Lq2iy7X1m-1SsYvdifW719mxKSE0i4a2Dj80UCvCVJsB1sRw6RLgWozQs7-18o52jaf9U5KD7XL95W-gLVFmfRRi7CnftW_PgHWYMTo |
linkProvider | Hindawi Publishing |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Consensus+Clustering-Based+Undersampling+Approach+to+Imbalanced+Learning&rft.jtitle=Scientific+programming&rft.au=Onan%2C+Aytu%C4%9F&rft.date=2019-01-01&rft.issn=1058-9244&rft.eissn=1875-919X&rft.volume=2019&rft.spage=1&rft.epage=14&rft_id=info:doi/10.1155%2F2019%2F5901087&rft.externalDBID=n%2Fa&rft.externalDocID=10_1155_2019_5901087 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-9244&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-9244&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-9244&client=summon |