COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction
Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective ins...
Saved in:
Published in | Information and software technology Vol. 129; p. 106432 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate either near-duplicated instances which result in overgeneralization (high probability of false alarm, pf) or overly diverse instances which hurt the prediction model’s ability to find defects (resulting in low probability of detection, pd). Furthermore, when existing oversampling techniques are applied in SDP, the effort needed to inspect the instances with different complexity is not taken into consideration.
In this study, we introduce Complexity-based OverSampling TEchnique (COSTE), a novel oversampling technique that can achieve low pf and high pd simultaneously. Meanwhile, COSTE also performs better in terms of Norm(popt) and ACC, two effort-aware measures that consider the testing effort.
COSTE combines pairs of defective instances with similar complexity to generate synthetic instances, which improves the diversity within the data, maintains the ability of prediction models to find defects, and takes the different testing effort needed for different instances into consideration. We conduct experiments to compare COSTE with Synthetic Minority Oversampling TEchnique, Borderline-SMOTE, Majority Weighted Minority Oversampling TEchnique and MAHAKIL.
The experimental results on 23 releases of 10 projects show that COSTE greatly improves the diversity of the synthetic instances without compromising the ability of prediction models to find defects. In addition, COSTE outperforms the other oversampling techniques under the same testing effort. The statistical analysis indicates that COSTE’s ability to outperform the other oversampling techniques is significant under the statistical Wilcoxon rank sum test and Cliff’s effect size.
COSTE is recommended as an efficient alternative to address the class imbalance problem in SDP. |
---|---|
AbstractList | Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate either near-duplicated instances which result in overgeneralization (high probability of false alarm, pf) or overly diverse instances which hurt the prediction model’s ability to find defects (resulting in low probability of detection, pd). Furthermore, when existing oversampling techniques are applied in SDP, the effort needed to inspect the instances with different complexity is not taken into consideration.
In this study, we introduce Complexity-based OverSampling TEchnique (COSTE), a novel oversampling technique that can achieve low pf and high pd simultaneously. Meanwhile, COSTE also performs better in terms of Norm(popt) and ACC, two effort-aware measures that consider the testing effort.
COSTE combines pairs of defective instances with similar complexity to generate synthetic instances, which improves the diversity within the data, maintains the ability of prediction models to find defects, and takes the different testing effort needed for different instances into consideration. We conduct experiments to compare COSTE with Synthetic Minority Oversampling TEchnique, Borderline-SMOTE, Majority Weighted Minority Oversampling TEchnique and MAHAKIL.
The experimental results on 23 releases of 10 projects show that COSTE greatly improves the diversity of the synthetic instances without compromising the ability of prediction models to find defects. In addition, COSTE outperforms the other oversampling techniques under the same testing effort. The statistical analysis indicates that COSTE’s ability to outperform the other oversampling techniques is significant under the statistical Wilcoxon rank sum test and Cliff’s effect size.
COSTE is recommended as an efficient alternative to address the class imbalance problem in SDP. |
ArticleNumber | 106432 |
Author | Xiao, Yan Bennin, Kwabena Ebo Kabir, Md Alamgir Zhang, Miao Keung, Jacky Feng, Shuo Yu, Xiao |
Author_xml | – sequence: 1 givenname: Shuo orcidid: 0000-0002-1575-9891 surname: Feng fullname: Feng, Shuo email: shuofeng5-c@my.cityu.edu.hk organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China – sequence: 2 givenname: Jacky orcidid: 0000-0002-3803-9600 surname: Keung fullname: Keung, Jacky email: jacky.keung@cityu.edu.hk organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China – sequence: 3 givenname: Xiao surname: Yu fullname: Yu, Xiao email: xyu224-c@my.cityu.edu.hk organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China – sequence: 4 givenname: Yan surname: Xiao fullname: Xiao, Yan email: xiaoyan.hhu@gmail.com organization: School of Computing, National University of Singapore, 117417, Singapore – sequence: 5 givenname: Kwabena Ebo surname: Bennin fullname: Bennin, Kwabena Ebo email: kwabena.bennin@wur.nl organization: Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands – sequence: 6 givenname: Md Alamgir orcidid: 0000-0002-7136-6339 surname: Kabir fullname: Kabir, Md Alamgir email: makabir4-c@my.cityu.edu.hk organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China – sequence: 7 givenname: Miao surname: Zhang fullname: Zhang, Miao email: miazhang9-c@my.cityu.edu.hk organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China |
BookMark | eNqFkNtKAzEQhoNUsFbfwIu8wNZks8deCLLUAxR60XodssnEpuyhJmm1b2-W9coLhYFh_uEf_vmu0aTrO0DojpI5JTS7389Np12v5zGJBylLWHyBprTIWZSROJ2gKSlTEqVFUl6ha-f2hNCcMDJFx2q92S4XuOrbQwNfxp-jWjhQeH0CuxFBNN073i7lrjMfR8C-x6Jp4GSED8MOsGyEc9i0tWhEJwEfbF830GLT4ZDIfwoLWIEG6cMKlJHe9N0NutSicXD702fo7Wm5rV6i1fr5tXpcRZKRzEc5rZOShKCyzOuUKE0hLZmumcpCA1GHYrQoQBWxVnUsgekky9MwJ7mKNZuhZLwrbe-cBc0P1rTCnjklfEDH93xExwd0fEQXbItfNmm8GIJ7K0zzn_lhNEN47GTAcicNBDTK2ECBq978feAbwt-RXw |
CitedBy_id | crossref_primary_10_1109_ACCESS_2025_3532250 crossref_primary_10_1049_2023_6293074 crossref_primary_10_1109_TSE_2024_3492204 crossref_primary_10_1142_S0219649223500478 crossref_primary_10_1002_smr_2634 crossref_primary_10_4018_IJSI_309735 crossref_primary_10_1109_ACCESS_2023_3239266 crossref_primary_10_1007_s10489_024_05930_z crossref_primary_10_1016_j_eswa_2024_125919 crossref_primary_10_1016_j_eswa_2023_122409 crossref_primary_10_1109_ACCESS_2023_3262604 crossref_primary_10_1016_j_asoc_2023_110952 crossref_primary_10_1016_j_eswa_2023_121039 crossref_primary_10_1049_sfw2_12099 crossref_primary_10_1016_j_scico_2024_103164 crossref_primary_10_3390_s21103314 crossref_primary_10_3233_JIFS_221902 crossref_primary_10_1007_s10489_025_06288_6 crossref_primary_10_1109_ACCESS_2024_3396155 crossref_primary_10_1007_s10586_024_04446_y crossref_primary_10_1016_j_infsof_2021_106662 crossref_primary_10_7717_peerj_cs_2270 crossref_primary_10_1016_j_infsof_2021_106588 crossref_primary_10_1016_j_infsof_2021_106742 crossref_primary_10_1016_j_infsof_2022_107016 crossref_primary_10_1049_2024_5550801 crossref_primary_10_32604_cmc_2024_057538 crossref_primary_10_1016_j_knosys_2024_111835 crossref_primary_10_1109_TR_2023_3295012 crossref_primary_10_1111_exsy_12977 crossref_primary_10_1016_j_ins_2022_07_130 crossref_primary_10_1016_j_infsof_2021_106747 crossref_primary_10_1007_s10462_022_10371_6 crossref_primary_10_1109_ACCESS_2022_3211401 crossref_primary_10_1109_TR_2022_3158949 crossref_primary_10_1145_3699602 crossref_primary_10_1142_S0219467824500451 crossref_primary_10_1109_TR_2024_3393734 crossref_primary_10_1080_1206212X_2023_2252117 crossref_primary_10_3390_app131810466 crossref_primary_10_1007_s11219_023_09615_7 crossref_primary_10_1007_s11334_024_00571_4 crossref_primary_10_1002_smr_2731 crossref_primary_10_1007_s40747_022_00676_y crossref_primary_10_1186_s40537_023_00715_6 crossref_primary_10_1109_ACCESS_2025_3550583 crossref_primary_10_1109_ACCESS_2022_3211978 crossref_primary_10_1016_j_jjimei_2022_100153 crossref_primary_10_1016_j_infsof_2023_107250 crossref_primary_10_1016_j_eswa_2023_121251 crossref_primary_10_1016_j_jss_2023_111858 crossref_primary_10_1007_s00500_024_09881_y crossref_primary_10_1016_j_infsof_2022_106985 crossref_primary_10_1016_j_eswa_2023_123041 crossref_primary_10_1007_s10664_022_10186_7 crossref_primary_10_1007_s11334_025_00601_9 crossref_primary_10_4018_IJSSCI_301268 crossref_primary_10_1109_TR_2023_3272651 crossref_primary_10_1155_acis_1013769 crossref_primary_10_1007_s11219_023_09640_6 crossref_primary_10_1007_s11334_021_00399_2 crossref_primary_10_1016_j_jss_2024_112131 crossref_primary_10_1007_s11227_024_06312_5 crossref_primary_10_1016_j_neucom_2024_128538 crossref_primary_10_1016_j_rineng_2025_104123 crossref_primary_10_3233_IDA_226612 crossref_primary_10_3390_sym14122508 crossref_primary_10_1002_spe_3316 crossref_primary_10_1007_s13369_024_08740_0 crossref_primary_10_1016_j_asoc_2022_109069 |
Cites_doi | 10.1109/TIT.1967.1053964 10.1109/IRI.2012.6303039 10.1007/s10515-019-00259-1 10.1007/s00521-007-0089-7 10.1109/TKDE.2012.232 10.1016/j.infsof.2020.106287 10.1016/j.infsof.2007.02.015 10.1016/j.engappai.2017.10.019 10.1016/j.infsof.2019.07.003 10.1016/j.infsof.2011.09.007 10.1007/s10664-018-9633-6 10.1007/s13748-016-0094-0 10.1109/TSE.2007.256941 10.3233/IDA-2002-6504 10.1007/s10664-008-9079-3 10.1007/s00500-018-3093-1 10.1109/ACCESS.2018.2817572 10.1109/TSE.2007.70721 10.1016/j.ins.2016.09.041 10.1109/TSE.2016.2584050 10.1023/A:1008202821328 10.1109/ICMLA.2010.27 10.1109/ASE.2015.56 10.1016/j.bmcl.2005.01.061 10.1613/jair.953 10.1007/s10586-017-0892-6 10.1109/TSE.2005.49 10.1007/s10664-012-9218-8 10.1109/TSE.2012.70 10.1016/j.knosys.2012.12.007 10.1016/j.infsof.2014.12.006 10.1109/TSE.2018.2877678 10.1007/s10664-008-9103-7 10.1007/s11219-016-9342-6 10.1109/TSE.1984.5010196 10.1007/s10515-017-0220-7 10.1016/j.infsof.2014.07.005 10.1016/S0031-3203(96)00142-2 10.1016/j.asoc.2020.106089 10.1016/j.enbuild.2016.05.028 10.1109/TSMCC.2012.2226152 10.1016/j.eswa.2016.06.005 10.1109/TSE.2016.2543218 10.1016/j.infsof.2018.04.001 10.1109/TSE.2017.2731766 10.1016/j.infsof.2014.11.006 10.1016/S0169-7439(99)00047-7 10.1109/TKDE.2017.2779849 10.1145/2950290.2950353 10.1007/s10489-014-0610-5 10.1109/TR.2014.2316951 10.1109/TSE.2016.2599161 10.1016/j.infsof.2017.07.004 |
ContentType | Journal Article |
Copyright | 2020 Elsevier B.V. |
Copyright_xml | – notice: 2020 Elsevier B.V. |
DBID | AAYXX CITATION |
DOI | 10.1016/j.infsof.2020.106432 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Business |
EISSN | 1873-6025 |
ExternalDocumentID | 10_1016_j_infsof_2020_106432 S0950584920301889 |
GroupedDBID | --K --M -~X .DC .~1 0R~ 1B1 1~. 1~5 29I 4.4 457 4G. 5GY 5VS 7-5 71M 77K 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN AAYOK ABBOA ABFNM ABFRF ABJNI ABMAC ABTAH ABXDB ABYKQ ACDAQ ACGFO ACGFS ACGOD ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD AEBSH AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AKRWK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BKOJK BKOMP BLXMC CS3 DU5 EBS EFJIC EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W KOM LG9 M41 MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SSV SSZ T5K TWZ UHS UNMZH WH7 WUQ XFK ZY4 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKYEP ANKPU APXCP BNPGV CITATION SSH |
ID | FETCH-LOGICAL-c306t-71b490703c97b50df1e593fb3d693feabeab3188ed82fdb2ce3f4675ed847d2f3 |
IEDL.DBID | .~1 |
ISSN | 0950-5849 |
IngestDate | Thu Apr 24 23:11:01 EDT 2025 Tue Jul 01 02:22:04 EDT 2025 Mon Apr 08 05:17:07 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | MAHAKIL Effort-aware defect prediction SMOTE Oversampling Software defect prediction Class imbalance |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c306t-71b490703c97b50df1e593fb3d693feabeab3188ed82fdb2ce3f4675ed847d2f3 |
ORCID | 0000-0002-1575-9891 0000-0002-3803-9600 0000-0002-7136-6339 |
ParticipantIDs | crossref_primary_10_1016_j_infsof_2020_106432 crossref_citationtrail_10_1016_j_infsof_2020_106432 elsevier_sciencedirect_doi_10_1016_j_infsof_2020_106432 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | January 2021 2021-01-00 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – month: 01 year: 2021 text: January 2021 |
PublicationDecade | 2020 |
PublicationTitle | Information and software technology |
PublicationYear | 2021 |
Publisher | Elsevier B.V |
Publisher_xml | – name: Elsevier B.V |
References | He, Bai, Garcia, Li (b24) 2008 Huda, Miah, Hassan, Islam, Yearwood, Alrubaian, Almogren (b79) 2017; 379 Maciejewski, Stefanowski (b64) 2011 Zhang, Song, Wang, Zhang, He, Jia (b19) 2015; 42 Li, He, Zhu, Lyu (b78) 2017 Ostrand, Weyuker, Bell (b8) 2005; 31 Japkowicz, Stephen (b21) 2002; 6 Lin, Hsieh, Liu, Lin, Fang, Wang, Yen, Pal, Chuang (b56) 2017; 30 Zhang, Hassan, McIntosh, Zou (b2) 2016; 43 Bradley (b66) 1997; 30 Menzies, Dekhtyar, Distefano, Greenwald (b74) 2007; 33 Gong, Jiang, Wang, Jiang (b58) 2019 Nagappan, Ball, Zeller (b7) 2006 Yu, Wu, Jian, Bennin, Fu, Ma (b50) 2018; 22 Zhou (b20) 2013; 41 Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg (b72) 2011; 12 Buckley, Poston (b1) 1984; SE-10 Krawczyk (b13) 2016; 5 Drummond, Holte (b52) 2003 Jiang, Cukic, Ma (b63) 2008; 13 Sharmeen, Huda, Abawajy, Hassan (b71) 2020; 89 Guo, Yin, Dong, Yang, Zhou (b53) 2008 Zhang, Deb, Lee, Yang, Shah (b39) 2016; 126 Turhan, Menzies, Bener, Di Stefano (b28) 2009; 14 Xia, Lo, Pan, Nagappan, Wang (b62) 2016; 42 Bennin, Keung, Phannachitta, Monden, Mensah (b12) 2018; 44 Chen, Fang, Shang, Tang (b18) 2018; 26 J. Nam, S. Kim, CLAMI: Defect prediction on unlabeled datasets (T), in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 452–463 Tax (b25) 2002 Han, Wang, Mao (b23) 2005 Sun, Song, Zhu (b44) 2012; 42 Tantithamthavorn, McIntosh, Hassan, Matsumoto (b75) 2016; 43 Biswas, Suganthan, Mallipeddi, Amaratunga (b38) 2018; 68 Öztürk (b77) 2017; 92 Zhou, Sun, Xia, Li, Chen (b59) 2019; 114 Wong, Leung, Ling (b27) 2013 Kamei, Shihab, Adams, Hassan, Mockus, Sinha, Ubayashi (b31) 2012; 39 Li, Jing, Wu, Zhu, Xu, Ying (b5) 2018; 25 Mende, Koschke (b32) 2010 . Xia, Lo, Shihab, Wang, Yang (b46) 2015; 61 A.A. Shanab, T.M. Khoshgoftaar, R. Wald, A. Napolitano, Impact of noise and data sampling on stability of feature ranking techniques for biological datasets, in: 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), 2012, pp. 415–422 Agrawal, Menzies (b30) 2018 Chawla, Bowyer, Hall, Kegelmeyer (b22) 2002; 16 Li, Jing, Zhu, Zhang, Xu, Ying (b61) 2019; 26 Xing, Guo, Lyu (b70) 2005 Okutan, Yıldız (b11) 2014; 19 H. Wang, T.M. Khoshgoftaar, A. Napolitano, A comparative study of ensemble feature selection techniques for software defect prediction, in: 2010 Ninth International Conference on Machine Learning and Applications, 2010, pp. 135–140 Onan, Korukoğlu, Bulut (b40) 2016; 62 Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu, L. Xu, B. Xu, H. Leung, Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 157–168. Weiss, Provost (b14) 2001 Shirabad, Menzies (b37) 2005 Cover, Hart (b68) 1967; 13 Liu, Miao, Zhang (b48) 2014; 63 Ma, Guo, Cukic (b69) 2007 Fan, Diao, Yu, Yang, Chen (b76) 2019; 2019 Huda, Liu, Abdelrazek, Ibrahim, Alyahya, Al-Dossari, Ahmad (b55) 2018; 6 He, Li, Liu, Chen, Ma (b65) 2015; 59 Zhang, Zheng, Zou, Hassan (b34) 2016 Ali, Huda, Abawajy, Alyahya, Al-Dossari, Yearwood (b6) 2017; 20 Wan, Xia, Hassan, Lo, Yin, Yang (b4) 2018 Saçar, Allmer (b42) 2013 Jing, Ying, Zhang, Wu, Liu (b49) 2014 Tomar, Agarwal (b51) 2016; 2016 Bennin, Keung, Monden (b17) 2017 Kampenes, Dybå, Hannay, Sjøberg (b67) 2007; 49 De Maesschalck, Jouan-Rimbaud, Massart (b26) 2000; 50 Storn, Price (b57) 1997; 11 Menzies, Greenwald, Frank (b36) 2006; 33 Limsettho, Bennin, Keung, Hata, Matsumoto (b3) 2018; 100 Tomaszewski, Grahn, Lundberg (b9) 2006 Yoon, Kwek (b15) 2007; 16 Zhang, Li, Zhao, Wang, Pan, Tanaka, Kadota (b41) 2005; 15 Ma, Luo, Zeng, Chen (b10) 2012; 54 Barua, Islam, Yao, Murase (b54) 2012; 26 Provost (b43) 2000 Bennin, Keung, Monden (b16) 2019; 24 Li, Shepperd, Guo (b60) 2020 Laradji, Alshayeb, Ghouti (b45) 2015; 58 Bennin, Keung, Monden, Phannachitta, Mensah (b29) 2017 Cover (10.1016/j.infsof.2020.106432_b68) 1967; 13 Ma (10.1016/j.infsof.2020.106432_b69) 2007 Tomaszewski (10.1016/j.infsof.2020.106432_b9) 2006 Menzies (10.1016/j.infsof.2020.106432_b74) 2007; 33 Buckley (10.1016/j.infsof.2020.106432_b1) 1984; SE-10 Huda (10.1016/j.infsof.2020.106432_b55) 2018; 6 Sharmeen (10.1016/j.infsof.2020.106432_b71) 2020; 89 Onan (10.1016/j.infsof.2020.106432_b40) 2016; 62 Lin (10.1016/j.infsof.2020.106432_b56) 2017; 30 Limsettho (10.1016/j.infsof.2020.106432_b3) 2018; 100 Fan (10.1016/j.infsof.2020.106432_b76) 2019; 2019 Yoon (10.1016/j.infsof.2020.106432_b15) 2007; 16 Krawczyk (10.1016/j.infsof.2020.106432_b13) 2016; 5 Kamei (10.1016/j.infsof.2020.106432_b31) 2012; 39 Zhou (10.1016/j.infsof.2020.106432_b20) 2013; 41 Agrawal (10.1016/j.infsof.2020.106432_b30) 2018 Bennin (10.1016/j.infsof.2020.106432_b29) 2017 Laradji (10.1016/j.infsof.2020.106432_b45) 2015; 58 De Maesschalck (10.1016/j.infsof.2020.106432_b26) 2000; 50 Wong (10.1016/j.infsof.2020.106432_b27) 2013 Kampenes (10.1016/j.infsof.2020.106432_b67) 2007; 49 Bradley (10.1016/j.infsof.2020.106432_b66) 1997; 30 Pedregosa (10.1016/j.infsof.2020.106432_b72) 2011; 12 Huda (10.1016/j.infsof.2020.106432_b79) 2017; 379 Ma (10.1016/j.infsof.2020.106432_b10) 2012; 54 Zhou (10.1016/j.infsof.2020.106432_b59) 2019; 114 Ostrand (10.1016/j.infsof.2020.106432_b8) 2005; 31 Tantithamthavorn (10.1016/j.infsof.2020.106432_b75) 2016; 43 Shirabad (10.1016/j.infsof.2020.106432_b37) 2005 Öztürk (10.1016/j.infsof.2020.106432_b77) 2017; 92 Biswas (10.1016/j.infsof.2020.106432_b38) 2018; 68 Gong (10.1016/j.infsof.2020.106432_b58) 2019 Guo (10.1016/j.infsof.2020.106432_b53) 2008 Zhang (10.1016/j.infsof.2020.106432_b39) 2016; 126 Jiang (10.1016/j.infsof.2020.106432_b63) 2008; 13 Xia (10.1016/j.infsof.2020.106432_b46) 2015; 61 Turhan (10.1016/j.infsof.2020.106432_b28) 2009; 14 Tax (10.1016/j.infsof.2020.106432_b25) 2002 Li (10.1016/j.infsof.2020.106432_b5) 2018; 25 Ali (10.1016/j.infsof.2020.106432_b6) 2017; 20 Zhang (10.1016/j.infsof.2020.106432_b19) 2015; 42 10.1016/j.infsof.2020.106432_b47 Sun (10.1016/j.infsof.2020.106432_b44) 2012; 42 Japkowicz (10.1016/j.infsof.2020.106432_b21) 2002; 6 Xing (10.1016/j.infsof.2020.106432_b70) 2005 Yu (10.1016/j.infsof.2020.106432_b50) 2018; 22 Han (10.1016/j.infsof.2020.106432_b23) 2005 Chen (10.1016/j.infsof.2020.106432_b18) 2018; 26 Mende (10.1016/j.infsof.2020.106432_b32) 2010 Saçar (10.1016/j.infsof.2020.106432_b42) 2013 Provost (10.1016/j.infsof.2020.106432_b43) 2000 Nagappan (10.1016/j.infsof.2020.106432_b7) 2006 Liu (10.1016/j.infsof.2020.106432_b48) 2014; 63 Li (10.1016/j.infsof.2020.106432_b60) 2020 Xia (10.1016/j.infsof.2020.106432_b62) 2016; 42 Okutan (10.1016/j.infsof.2020.106432_b11) 2014; 19 Zhang (10.1016/j.infsof.2020.106432_b34) 2016 Zhang (10.1016/j.infsof.2020.106432_b41) 2005; 15 10.1016/j.infsof.2020.106432_b33 Drummond (10.1016/j.infsof.2020.106432_b52) 2003 10.1016/j.infsof.2020.106432_b35 Li (10.1016/j.infsof.2020.106432_b61) 2019; 26 10.1016/j.infsof.2020.106432_b73 Zhang (10.1016/j.infsof.2020.106432_b2) 2016; 43 Weiss (10.1016/j.infsof.2020.106432_b14) 2001 Storn (10.1016/j.infsof.2020.106432_b57) 1997; 11 Bennin (10.1016/j.infsof.2020.106432_b17) 2017 Maciejewski (10.1016/j.infsof.2020.106432_b64) 2011 Wan (10.1016/j.infsof.2020.106432_b4) 2018 Tomar (10.1016/j.infsof.2020.106432_b51) 2016; 2016 Bennin (10.1016/j.infsof.2020.106432_b12) 2018; 44 Menzies (10.1016/j.infsof.2020.106432_b36) 2006; 33 Li (10.1016/j.infsof.2020.106432_b78) 2017 Jing (10.1016/j.infsof.2020.106432_b49) 2014 Bennin (10.1016/j.infsof.2020.106432_b16) 2019; 24 Chawla (10.1016/j.infsof.2020.106432_b22) 2002; 16 He (10.1016/j.infsof.2020.106432_b24) 2008 He (10.1016/j.infsof.2020.106432_b65) 2015; 59 Barua (10.1016/j.infsof.2020.106432_b54) 2012; 26 |
References_xml | – volume: 12 start-page: 2825 year: 2011 end-page: 2830 ident: b72 article-title: Scikit-learn: Machine learning in python publication-title: J. Mach. Learn. Res. – year: 2018 ident: b4 article-title: Perceptions, expectations, and challenges in defect prediction publication-title: IEEE Trans. Softw. Eng. – volume: 26 start-page: 599 year: 2019 end-page: 651 ident: b61 article-title: Heterogeneous defect prediction with two-stage ensemble learning publication-title: Autom. Softw. Eng. – volume: 49 start-page: 1073 year: 2007 end-page: 1086 ident: b67 article-title: A systematic review of effect size in software engineering experiments publication-title: Inf. Softw. Technol. – start-page: 237 year: 2007 end-page: 263 ident: b69 article-title: A statistical framework for the prediction of fault-proneness publication-title: Advances in Machine Learning Applications in Software Engineering – volume: 42 start-page: 544 year: 2015 end-page: 565 ident: b19 article-title: A dissimilarity-based imbalance data classification algorithm publication-title: Appl. Intell. – volume: 41 start-page: 16 year: 2013 end-page: 25 ident: b20 article-title: Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods publication-title: Knowl.-Based Syst. – volume: 54 start-page: 248 year: 2012 end-page: 256 ident: b10 article-title: Transfer learning for cross-company software defect prediction publication-title: Inf. Softw. Technol. – year: 2002 ident: b25 article-title: One-class classification: Concept learning in the absence of counter-examples – start-page: 630 year: 2017 end-page: 635 ident: b17 article-title: Impact of the distribution parameter of data sampling approaches on software defect prediction models publication-title: 2017 24th Asia-Pacific Software Engineering Conference (APSEC) – volume: 43 start-page: 1 year: 2016 end-page: 18 ident: b75 article-title: An empirical comparison of model validation techniques for defect prediction models publication-title: IEEE Trans. Softw. Eng. – year: 2005 ident: b37 article-title: The PROMISE Repository of Software Engineering Databases, vol. 24 – start-page: 318 year: 2017 end-page: 328 ident: b78 article-title: Software defect prediction via convolutional neural network publication-title: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS) – start-page: 192 year: 2008 end-page: 201 ident: b53 article-title: On the class imbalance problem publication-title: 2008 Fourth International Conference on Natural Computation, Vol. 4 – volume: 11 start-page: 341 year: 1997 end-page: 359 ident: b57 article-title: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces publication-title: J. Global Optim. – reference: A.A. Shanab, T.M. Khoshgoftaar, R. Wald, A. Napolitano, Impact of noise and data sampling on stability of feature ranking techniques for biological datasets, in: 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), 2012, pp. 415–422, – start-page: 487 year: 2006 end-page: 496 ident: b9 article-title: A method for an accurate early prediction of faults in modified classes publication-title: 2006 22nd IEEE International Conference on Software Maintenance – volume: 22 start-page: 3461 year: 2018 end-page: 3472 ident: b50 article-title: Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning publication-title: Soft Comput. – start-page: 1 year: 2013 end-page: 6 ident: b42 article-title: Data mining for microrna gene prediction: on the impact of class imbalance and feature number for microrna gene prediction publication-title: 2013 8th International Symposium on Health Informatics and Bioinformatics – year: 2020 ident: b60 article-title: A systematic review of unsupervised learning techniques for software defect prediction publication-title: Inf. Softw. Technol. – volume: 42 start-page: 1806 year: 2012 end-page: 1817 ident: b44 article-title: Using coding-based ensemble learning to improve software defect prediction publication-title: IEEE Trans. Syst. Man Cybern. B – volume: 30 start-page: 950 year: 2017 end-page: 962 ident: b56 article-title: Minority oversampling in kernel adaptive subspaces for class imbalanced datasets publication-title: IEEE Trans. Knowl. Data Eng. – volume: 62 start-page: 1 year: 2016 end-page: 16 ident: b40 article-title: A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification publication-title: Expert Syst. Appl. – volume: SE-10 start-page: 36 year: 1984 end-page: 41 ident: b1 article-title: Software quality assurance publication-title: IEEE Trans. Softw. Eng. – volume: 31 start-page: 340 year: 2005 end-page: 355 ident: b8 article-title: Predicting the location and number of faults in large software systems publication-title: IEEE Trans. Softw. Eng. – volume: 126 start-page: 94 year: 2016 end-page: 103 ident: b39 article-title: Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique publication-title: Energy Build. – start-page: 1050 year: 2018 end-page: 1061 ident: b30 article-title: Is “better data” better than “better data miners”? publication-title: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) – volume: 24 start-page: 602 year: 2019 end-page: 636 ident: b16 article-title: On the relative value of data resampling approaches for software defect prediction publication-title: Empir. Softw. Eng. – volume: 100 start-page: 87 year: 2018 end-page: 102 ident: b3 article-title: Cross project defect prediction using class distribution estimation and oversampling publication-title: Inf. Softw. Technol. – volume: 26 start-page: 97 year: 2018 end-page: 125 ident: b18 article-title: Tackling class overlap and imbalance problems in software defect prediction publication-title: Softw. Qual. J. – reference: H. Wang, T.M. Khoshgoftaar, A. Napolitano, A comparative study of ensemble feature selection techniques for software defect prediction, in: 2010 Ninth International Conference on Machine Learning and Applications, 2010, pp. 135–140, – volume: 5 start-page: 221 year: 2016 end-page: 232 ident: b13 article-title: Learning from imbalanced data: open challenges and future directions publication-title: Prog. Artif. Intell. – year: 2001 ident: b14 article-title: The effect of class distribution on classifier learning: an empirical study – volume: 13 start-page: 21 year: 1967 end-page: 27 ident: b68 article-title: Nearest neighbor pattern classification publication-title: IEEE Trans. Inform. Theory – volume: 42 start-page: 977 year: 2016 end-page: 998 ident: b62 article-title: Hydra: Massively compositional model for cross-project defect prediction publication-title: IEEE Trans. Softw. Eng. – start-page: 1 year: 2000 end-page: 3 ident: b43 article-title: Machine learning from imbalanced data sets 101 publication-title: Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, vol. 68 – volume: 19 start-page: 154 year: 2014 end-page: 181 ident: b11 article-title: Software defect prediction using Bayesian networks publication-title: Empir. Softw. Eng. – volume: 50 start-page: 1 year: 2000 end-page: 18 ident: b26 article-title: The mahalanobis distance publication-title: Chemometr. Intell. Lab. Syst. – volume: 6 start-page: 24184 year: 2018 end-page: 24195 ident: b55 article-title: An ensemble oversampling model for class imbalance problem in software defect prediction publication-title: IEEE Access – volume: 16 start-page: 321 year: 2002 end-page: 357 ident: b22 article-title: SMOTE: synthetic minority over-sampling technique publication-title: J. Artificial Intelligence Res. – volume: 114 start-page: 204 year: 2019 end-page: 216 ident: b59 article-title: Improving defect prediction with deep forest publication-title: Inf. Softw. Technol. – volume: 59 start-page: 170 year: 2015 end-page: 190 ident: b65 article-title: An empirical study on software defect prediction with a simplified metric set publication-title: Inf. Softw. Technol. – start-page: 2354 year: 2013 end-page: 2359 ident: b27 article-title: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets publication-title: Iecon 2013-39th Annual Conference of the Ieee Industrial Electronics Society – volume: 63 start-page: 676 year: 2014 end-page: 686 ident: b48 article-title: Two-stage cost-sensitive learning for software defect prediction publication-title: IEEE Trans. Reliab. – reference: J. Nam, S. Kim, CLAMI: Defect prediction on unlabeled datasets (T), in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 452–463, – reference: Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu, L. Xu, B. Xu, H. Leung, Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 157–168. – volume: 58 start-page: 388 year: 2015 end-page: 402 ident: b45 article-title: Software defect prediction using ensemble learning on selected features publication-title: Inf. Softw. Technol. – volume: 92 start-page: 17 year: 2017 end-page: 29 ident: b77 article-title: Which type of metrics are useful to deal with class imbalance in software defect prediction? publication-title: Inf. Softw. Technol. – start-page: 309 year: 2016 end-page: 320 ident: b34 article-title: Cross-project defect prediction using a connectivity-based unsupervised classifier publication-title: Proceedings of the 38th International Conference on Software Engineering – year: 2005 ident: b70 article-title: A novel method for early software quality prediction based on support vector machine publication-title: 16th IEEE International Symposium on Software Reliability Engineering (ISSRE’05) – volume: 25 start-page: 201 year: 2018 end-page: 245 ident: b5 article-title: Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction publication-title: Autom. Softw. Eng. – volume: 13 start-page: 561 year: 2008 end-page: 595 ident: b63 article-title: Techniques for evaluating fault prediction models publication-title: Empir. Softw. Eng. – start-page: 698 year: 2019 end-page: 709 ident: b58 article-title: Empirical evaluation of the impact of class overlap on software defect prediction publication-title: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) – start-page: 1322 year: 2008 end-page: 1328 ident: b24 article-title: ADASYN: Adaptive synthetic sampling approach for imbalanced learning publication-title: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) – start-page: 107 year: 2010 end-page: 116 ident: b32 article-title: Effort-aware defect prediction models publication-title: 2010 14th European Conference on Software Maintenance and Reengineering – volume: 2019 year: 2019 ident: b76 article-title: Software defect prediction via attention-based recurrent neural network publication-title: Sci. Program. – volume: 44 start-page: 534 year: 2018 end-page: 550 ident: b12 article-title: MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction publication-title: IEEE Trans. Softw. Eng. – volume: 26 start-page: 405 year: 2012 end-page: 425 ident: b54 article-title: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning publication-title: IEEE Trans. Knowl. Data Eng. – volume: 20 start-page: 2267 year: 2017 end-page: 2281 ident: b6 article-title: A parallel framework for software defect detection and metric selection on cloud computing publication-title: Cluster Comput. – volume: 379 start-page: 211 year: 2017 end-page: 228 ident: b79 article-title: Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data publication-title: Inform. Sci. – volume: 2016 start-page: 6:6 year: 2016 ident: b51 article-title: Prediction of defective software modules using class imbalance learning publication-title: Appl. Comp. Intell. Soft Comput. – volume: 30 start-page: 1145 year: 1997 end-page: 1159 ident: b66 article-title: The use of the area under the ROC curve in the evaluation of machine learning algorithms publication-title: Pattern Recognit. – volume: 89 year: 2020 ident: b71 article-title: An adaptive framework against android privilege escalation threats using deep learning and semi-supervised approaches publication-title: Appl. Soft Comput. – volume: 43 start-page: 476 year: 2016 end-page: 491 ident: b2 article-title: The use of summation to aggregate software metrics hinders the performance of defect prediction models publication-title: IEEE Trans. Softw. Eng. – volume: 16 start-page: 295 year: 2007 end-page: 306 ident: b15 article-title: A data reduction approach for resolving the imbalanced data issue in functional genomics publication-title: Neural Comput. Appl. – volume: 61 start-page: 93 year: 2015 end-page: 106 ident: b46 article-title: ELBlocker: Predicting blocking bugs with ensemble imbalance learning publication-title: Inf. Softw. Technol. – volume: 14 start-page: 540 year: 2009 end-page: 578 ident: b28 article-title: On the relative value of cross-company and within-company data for defect prediction publication-title: Empir. Softw. Eng. – volume: 6 start-page: 429 year: 2002 end-page: 449 ident: b21 article-title: The class imbalance problem: A systematic study publication-title: Intell. Data Anal. – reference: . – start-page: 1 year: 2003 end-page: 8 ident: b52 article-title: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling publication-title: Workshop on Learning from Imbalanced Datasets II, vol. 11 – start-page: 452 year: 2006 end-page: 461 ident: b7 article-title: Mining metrics to predict component failures publication-title: Proceedings of the 28th International Conference on Software Engineering – start-page: 364 year: 2017 end-page: 373 ident: b29 article-title: The significant effects of data sampling approaches on software defect prioritization and classification publication-title: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) – volume: 39 start-page: 757 year: 2012 end-page: 773 ident: b31 article-title: A large-scale empirical study of just-in-time quality assurance publication-title: IEEE Trans. Softw. Eng. – volume: 68 start-page: 81 year: 2018 end-page: 100 ident: b38 article-title: Optimal power flow solutions using differential evolution algorithm integrated with effective constraint handling techniques publication-title: Eng. Appl. Artif. Intell. – start-page: 878 year: 2005 end-page: 887 ident: b23 article-title: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning publication-title: International Conference on Intelligent Computing – volume: 15 start-page: 1629 year: 2005 end-page: 1632 ident: b41 article-title: Synthesis and activity of oleanolic acid derivatives, a novel class of inhibitors of osteoclast formation publication-title: Bioorganic Med. Chem. Lett. – volume: 33 start-page: 637 year: 2007 end-page: 640 ident: b74 article-title: Problems with precision: A response to “comments on’data mining static code attributes to learn defect predictors’” publication-title: IEEE Trans. Softw. Eng. – start-page: 414 year: 2014 end-page: 423 ident: b49 article-title: Dictionary learning based software defect prediction publication-title: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014 – start-page: 104 year: 2011 end-page: 111 ident: b64 article-title: Local neighbourhood extension of SMOTE for mining imbalanced data publication-title: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) – volume: 33 start-page: 2 year: 2006 end-page: 13 ident: b36 article-title: Data mining static code attributes to learn defect predictors publication-title: IEEE Trans. Softw. Eng. – volume: 2019 year: 2019 ident: 10.1016/j.infsof.2020.106432_b76 article-title: Software defect prediction via attention-based recurrent neural network publication-title: Sci. Program. – volume: 13 start-page: 21 issue: 1 year: 1967 ident: 10.1016/j.infsof.2020.106432_b68 article-title: Nearest neighbor pattern classification publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.1967.1053964 – ident: 10.1016/j.infsof.2020.106432_b73 doi: 10.1109/IRI.2012.6303039 – volume: 26 start-page: 599 issue: 3 year: 2019 ident: 10.1016/j.infsof.2020.106432_b61 article-title: Heterogeneous defect prediction with two-stage ensemble learning publication-title: Autom. Softw. Eng. doi: 10.1007/s10515-019-00259-1 – volume: 16 start-page: 295 issue: 3 year: 2007 ident: 10.1016/j.infsof.2020.106432_b15 article-title: A data reduction approach for resolving the imbalanced data issue in functional genomics publication-title: Neural Comput. Appl. doi: 10.1007/s00521-007-0089-7 – volume: 26 start-page: 405 issue: 2 year: 2012 ident: 10.1016/j.infsof.2020.106432_b54 article-title: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2012.232 – start-page: 698 year: 2019 ident: 10.1016/j.infsof.2020.106432_b58 article-title: Empirical evaluation of the impact of class overlap on software defect prediction – year: 2020 ident: 10.1016/j.infsof.2020.106432_b60 article-title: A systematic review of unsupervised learning techniques for software defect prediction publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2020.106287 – volume: 49 start-page: 1073 issue: 11–12 year: 2007 ident: 10.1016/j.infsof.2020.106432_b67 article-title: A systematic review of effect size in software engineering experiments publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2007.02.015 – volume: 68 start-page: 81 year: 2018 ident: 10.1016/j.infsof.2020.106432_b38 article-title: Optimal power flow solutions using differential evolution algorithm integrated with effective constraint handling techniques publication-title: Eng. Appl. Artif. Intell. doi: 10.1016/j.engappai.2017.10.019 – volume: 114 start-page: 204 year: 2019 ident: 10.1016/j.infsof.2020.106432_b59 article-title: Improving defect prediction with deep forest publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2019.07.003 – volume: 54 start-page: 248 issue: 3 year: 2012 ident: 10.1016/j.infsof.2020.106432_b10 article-title: Transfer learning for cross-company software defect prediction publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2011.09.007 – volume: 24 start-page: 602 issue: 2 year: 2019 ident: 10.1016/j.infsof.2020.106432_b16 article-title: On the relative value of data resampling approaches for software defect prediction publication-title: Empir. Softw. Eng. doi: 10.1007/s10664-018-9633-6 – volume: 5 start-page: 221 issue: 4 year: 2016 ident: 10.1016/j.infsof.2020.106432_b13 article-title: Learning from imbalanced data: open challenges and future directions publication-title: Prog. Artif. Intell. doi: 10.1007/s13748-016-0094-0 – start-page: 309 year: 2016 ident: 10.1016/j.infsof.2020.106432_b34 article-title: Cross-project defect prediction using a connectivity-based unsupervised classifier – volume: 33 start-page: 2 issue: 1 year: 2006 ident: 10.1016/j.infsof.2020.106432_b36 article-title: Data mining static code attributes to learn defect predictors publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2007.256941 – volume: 6 start-page: 429 issue: 5 year: 2002 ident: 10.1016/j.infsof.2020.106432_b21 article-title: The class imbalance problem: A systematic study publication-title: Intell. Data Anal. doi: 10.3233/IDA-2002-6504 – year: 2005 ident: 10.1016/j.infsof.2020.106432_b37 – start-page: 414 year: 2014 ident: 10.1016/j.infsof.2020.106432_b49 article-title: Dictionary learning based software defect prediction – volume: 13 start-page: 561 issue: 5 year: 2008 ident: 10.1016/j.infsof.2020.106432_b63 article-title: Techniques for evaluating fault prediction models publication-title: Empir. Softw. Eng. doi: 10.1007/s10664-008-9079-3 – start-page: 1 year: 2003 ident: 10.1016/j.infsof.2020.106432_b52 article-title: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling – volume: 22 start-page: 3461 issue: 10 year: 2018 ident: 10.1016/j.infsof.2020.106432_b50 article-title: Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning publication-title: Soft Comput. doi: 10.1007/s00500-018-3093-1 – volume: 6 start-page: 24184 year: 2018 ident: 10.1016/j.infsof.2020.106432_b55 article-title: An ensemble oversampling model for class imbalance problem in software defect prediction publication-title: IEEE Access doi: 10.1109/ACCESS.2018.2817572 – start-page: 104 year: 2011 ident: 10.1016/j.infsof.2020.106432_b64 article-title: Local neighbourhood extension of SMOTE for mining imbalanced data – volume: 33 start-page: 637 issue: 9 year: 2007 ident: 10.1016/j.infsof.2020.106432_b74 article-title: Problems with precision: A response to “comments on’data mining static code attributes to learn defect predictors’” publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2007.70721 – start-page: 364 year: 2017 ident: 10.1016/j.infsof.2020.106432_b29 article-title: The significant effects of data sampling approaches on software defect prioritization and classification – start-page: 452 year: 2006 ident: 10.1016/j.infsof.2020.106432_b7 article-title: Mining metrics to predict component failures – start-page: 2354 year: 2013 ident: 10.1016/j.infsof.2020.106432_b27 article-title: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets – volume: 379 start-page: 211 year: 2017 ident: 10.1016/j.infsof.2020.106432_b79 article-title: Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data publication-title: Inform. Sci. doi: 10.1016/j.ins.2016.09.041 – volume: 43 start-page: 1 issue: 1 year: 2016 ident: 10.1016/j.infsof.2020.106432_b75 article-title: An empirical comparison of model validation techniques for defect prediction models publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2016.2584050 – volume: 11 start-page: 341 issue: 4 year: 1997 ident: 10.1016/j.infsof.2020.106432_b57 article-title: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces publication-title: J. Global Optim. doi: 10.1023/A:1008202821328 – ident: 10.1016/j.infsof.2020.106432_b47 doi: 10.1109/ICMLA.2010.27 – year: 2001 ident: 10.1016/j.infsof.2020.106432_b14 – volume: 12 start-page: 2825 year: 2011 ident: 10.1016/j.infsof.2020.106432_b72 article-title: Scikit-learn: Machine learning in python publication-title: J. Mach. Learn. Res. – start-page: 878 year: 2005 ident: 10.1016/j.infsof.2020.106432_b23 article-title: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning – ident: 10.1016/j.infsof.2020.106432_b35 doi: 10.1109/ASE.2015.56 – start-page: 237 year: 2007 ident: 10.1016/j.infsof.2020.106432_b69 article-title: A statistical framework for the prediction of fault-proneness – volume: 15 start-page: 1629 issue: 6 year: 2005 ident: 10.1016/j.infsof.2020.106432_b41 article-title: Synthesis and activity of oleanolic acid derivatives, a novel class of inhibitors of osteoclast formation publication-title: Bioorganic Med. Chem. Lett. doi: 10.1016/j.bmcl.2005.01.061 – volume: 16 start-page: 321 year: 2002 ident: 10.1016/j.infsof.2020.106432_b22 article-title: SMOTE: synthetic minority over-sampling technique publication-title: J. Artificial Intelligence Res. doi: 10.1613/jair.953 – volume: 20 start-page: 2267 issue: 3 year: 2017 ident: 10.1016/j.infsof.2020.106432_b6 article-title: A parallel framework for software defect detection and metric selection on cloud computing publication-title: Cluster Comput. doi: 10.1007/s10586-017-0892-6 – volume: 31 start-page: 340 issue: 4 year: 2005 ident: 10.1016/j.infsof.2020.106432_b8 article-title: Predicting the location and number of faults in large software systems publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2005.49 – volume: 19 start-page: 154 issue: 1 year: 2014 ident: 10.1016/j.infsof.2020.106432_b11 article-title: Software defect prediction using Bayesian networks publication-title: Empir. Softw. Eng. doi: 10.1007/s10664-012-9218-8 – volume: 39 start-page: 757 issue: 6 year: 2012 ident: 10.1016/j.infsof.2020.106432_b31 article-title: A large-scale empirical study of just-in-time quality assurance publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2012.70 – volume: 41 start-page: 16 year: 2013 ident: 10.1016/j.infsof.2020.106432_b20 article-title: Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods publication-title: Knowl.-Based Syst. doi: 10.1016/j.knosys.2012.12.007 – start-page: 1 year: 2000 ident: 10.1016/j.infsof.2020.106432_b43 article-title: Machine learning from imbalanced data sets 101 – start-page: 107 year: 2010 ident: 10.1016/j.infsof.2020.106432_b32 article-title: Effort-aware defect prediction models – volume: 61 start-page: 93 year: 2015 ident: 10.1016/j.infsof.2020.106432_b46 article-title: ELBlocker: Predicting blocking bugs with ensemble imbalance learning publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2014.12.006 – year: 2018 ident: 10.1016/j.infsof.2020.106432_b4 article-title: Perceptions, expectations, and challenges in defect prediction publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2018.2877678 – volume: 14 start-page: 540 issue: 5 year: 2009 ident: 10.1016/j.infsof.2020.106432_b28 article-title: On the relative value of cross-company and within-company data for defect prediction publication-title: Empir. Softw. Eng. doi: 10.1007/s10664-008-9103-7 – start-page: 318 year: 2017 ident: 10.1016/j.infsof.2020.106432_b78 article-title: Software defect prediction via convolutional neural network – volume: 26 start-page: 97 issue: 1 year: 2018 ident: 10.1016/j.infsof.2020.106432_b18 article-title: Tackling class overlap and imbalance problems in software defect prediction publication-title: Softw. Qual. J. doi: 10.1007/s11219-016-9342-6 – volume: SE-10 start-page: 36 issue: 1 year: 1984 ident: 10.1016/j.infsof.2020.106432_b1 article-title: Software quality assurance publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.1984.5010196 – volume: 25 start-page: 201 issue: 2 year: 2018 ident: 10.1016/j.infsof.2020.106432_b5 article-title: Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction publication-title: Autom. Softw. Eng. doi: 10.1007/s10515-017-0220-7 – start-page: 1322 year: 2008 ident: 10.1016/j.infsof.2020.106432_b24 article-title: ADASYN: Adaptive synthetic sampling approach for imbalanced learning – volume: 58 start-page: 388 year: 2015 ident: 10.1016/j.infsof.2020.106432_b45 article-title: Software defect prediction using ensemble learning on selected features publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2014.07.005 – volume: 30 start-page: 1145 issue: 7 year: 1997 ident: 10.1016/j.infsof.2020.106432_b66 article-title: The use of the area under the ROC curve in the evaluation of machine learning algorithms publication-title: Pattern Recognit. doi: 10.1016/S0031-3203(96)00142-2 – start-page: 192 year: 2008 ident: 10.1016/j.infsof.2020.106432_b53 article-title: On the class imbalance problem – volume: 89 year: 2020 ident: 10.1016/j.infsof.2020.106432_b71 article-title: An adaptive framework against android privilege escalation threats using deep learning and semi-supervised approaches publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2020.106089 – start-page: 1050 year: 2018 ident: 10.1016/j.infsof.2020.106432_b30 article-title: Is “better data” better than “better data miners”? – volume: 126 start-page: 94 year: 2016 ident: 10.1016/j.infsof.2020.106432_b39 article-title: Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique publication-title: Energy Build. doi: 10.1016/j.enbuild.2016.05.028 – volume: 42 start-page: 1806 issue: 6 year: 2012 ident: 10.1016/j.infsof.2020.106432_b44 article-title: Using coding-based ensemble learning to improve software defect prediction publication-title: IEEE Trans. Syst. Man Cybern. B doi: 10.1109/TSMCC.2012.2226152 – volume: 62 start-page: 1 year: 2016 ident: 10.1016/j.infsof.2020.106432_b40 article-title: A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2016.06.005 – volume: 42 start-page: 977 issue: 10 year: 2016 ident: 10.1016/j.infsof.2020.106432_b62 article-title: Hydra: Massively compositional model for cross-project defect prediction publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2016.2543218 – volume: 100 start-page: 87 year: 2018 ident: 10.1016/j.infsof.2020.106432_b3 article-title: Cross project defect prediction using class distribution estimation and oversampling publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2018.04.001 – year: 2002 ident: 10.1016/j.infsof.2020.106432_b25 – volume: 44 start-page: 534 issue: 6 year: 2018 ident: 10.1016/j.infsof.2020.106432_b12 article-title: MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2017.2731766 – start-page: 1 year: 2013 ident: 10.1016/j.infsof.2020.106432_b42 article-title: Data mining for microrna gene prediction: on the impact of class imbalance and feature number for microrna gene prediction – volume: 59 start-page: 170 year: 2015 ident: 10.1016/j.infsof.2020.106432_b65 article-title: An empirical study on software defect prediction with a simplified metric set publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2014.11.006 – volume: 50 start-page: 1 issue: 1 year: 2000 ident: 10.1016/j.infsof.2020.106432_b26 article-title: The mahalanobis distance publication-title: Chemometr. Intell. Lab. Syst. doi: 10.1016/S0169-7439(99)00047-7 – volume: 30 start-page: 950 issue: 5 year: 2017 ident: 10.1016/j.infsof.2020.106432_b56 article-title: Minority oversampling in kernel adaptive subspaces for class imbalanced datasets publication-title: IEEE Trans. Knowl. Data Eng. doi: 10.1109/TKDE.2017.2779849 – ident: 10.1016/j.infsof.2020.106432_b33 doi: 10.1145/2950290.2950353 – year: 2005 ident: 10.1016/j.infsof.2020.106432_b70 article-title: A novel method for early software quality prediction based on support vector machine – start-page: 487 year: 2006 ident: 10.1016/j.infsof.2020.106432_b9 article-title: A method for an accurate early prediction of faults in modified classes – volume: 42 start-page: 544 issue: 3 year: 2015 ident: 10.1016/j.infsof.2020.106432_b19 article-title: A dissimilarity-based imbalance data classification algorithm publication-title: Appl. Intell. doi: 10.1007/s10489-014-0610-5 – volume: 63 start-page: 676 issue: 2 year: 2014 ident: 10.1016/j.infsof.2020.106432_b48 article-title: Two-stage cost-sensitive learning for software defect prediction publication-title: IEEE Trans. Reliab. doi: 10.1109/TR.2014.2316951 – volume: 2016 start-page: 6:6 year: 2016 ident: 10.1016/j.infsof.2020.106432_b51 article-title: Prediction of defective software modules using class imbalance learning publication-title: Appl. Comp. Intell. Soft Comput. – volume: 43 start-page: 476 issue: 5 year: 2016 ident: 10.1016/j.infsof.2020.106432_b2 article-title: The use of summation to aggregate software metrics hinders the performance of defect prediction models publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2016.2599161 – volume: 92 start-page: 17 year: 2017 ident: 10.1016/j.infsof.2020.106432_b77 article-title: Which type of metrics are useful to deal with class imbalance in software defect prediction? publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2017.07.004 – start-page: 630 year: 2017 ident: 10.1016/j.infsof.2020.106432_b17 article-title: Impact of the distribution parameter of data sampling approaches on software defect prediction models |
SSID | ssj0017030 |
Score | 2.590461 |
Snippet | Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as... |
SourceID | crossref elsevier |
SourceType | Enrichment Source Index Database Publisher |
StartPage | 106432 |
SubjectTerms | Class imbalance Effort-aware defect prediction MAHAKIL Oversampling SMOTE Software defect prediction |
Title | COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction |
URI | https://dx.doi.org/10.1016/j.infsof.2020.106432 |
Volume | 129 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1JS8NAFB6KgngRV6xLmYPX2GyTSbyV0lIV20Nb6C3MFo1oWmqqePG3-16WoiAKQiBkMgPD2yf53nuEXEitHR5oOJso31gQEYPOsTCwsMKPCALN3CK_4m4YDKb-zYzNGqRb58IgrLKy_aVNL6x1NdKuqNlepGl7DMGBDe4zcjGqD0NM4vN9jlJ--bGGeTgo0WW9PdvC2XX6XIHxAia-zLGQp4tD4Jzdn93TF5fT3yU7VaxIO-V29kjDZPtkq4aqH5BVdzSe9K4o6jTWtczfLXRKmo5APscCseLZPZ30qiqtNJ9T7JzyCtyAhwdDFYbONH2WiG9UhlbdZWiaUdhw_iaWhmqDgA94hX90kIuHZNrvTboDq2qjYCk4D-QWd6QfoWariEtm68QxLPIS6ekAbkZIuECzQ6NDN9HSVcZLwHwyePa5dhPviGxk88wcE8oh_okkU16SMF8IOxQh15wFhplIBJ5oEq-mXqyqGuPY6uIprsFkj3FJ8xhpHpc0bxJrvWpR1tj4Yz6vGRN_k5UY3MCvK0_-vfKUbLuIZik-vpyRjXy5MucQjuSyVchbi2x2rm8Hw08j_OBz |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qC-pFfOLbPXgNbZNsHt5KqVT7OrSF3sJudqMRTYumiv_emWYjCqIgBEI2GVjmvdlvZwAupVJN31O4NoldbWFGjDbHA8-iCj_C8xS3V-crBkOvO3VvZ3xWgXZ5FoZglcb3Fz595a3NSN1ws75I0_oYk4MGhs_Qpqw-CMI1qFF1Kl6FWuum1x1-biaQUhcl9xoWEZQn6FYwL5Tjy5xqedo0hPHZ_jlCfYk619uwZdJF1ipmtAMVne3CeolW34NlezSedK4YmTWVtszfLYpLio1QRceC4OLZHZt0TKFWls8ZNU95RYHgw71mMWXPLH2SBHGMNTMNZliaMZxw_iaeNVOaMB_4ijZ1SJD7ML3uTNpdy3RSsGJcEuSW35RuSMYdh77kDZU0NQ-dRDrKw5sWEi807kCrwE6UtGPtJOhBOT67vrIT5wCq2TzTh8B8TIFCyWMnSbgrRCMQga987mmuQ-E54gickntRbMqMU7eLx6jEkz1EBc8j4nlU8PwIrE-qRVFm44_v_VIw0Td1iTAS_Ep5_G_KC9joTgb9qH8z7J3Apk3gltW_mFOo5s9LfYbZSS7PjfZ9ANmb4yQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=COSTE%3A+Complexity-based+OverSampling+TEchnique+to+alleviate+the+class+imbalance+problem+in+software+defect+prediction&rft.jtitle=Information+and+software+technology&rft.au=Feng%2C+Shuo&rft.au=Keung%2C+Jacky&rft.au=Yu%2C+Xiao&rft.au=Xiao%2C+Yan&rft.date=2021-01-01&rft.issn=0950-5849&rft.volume=129&rft.spage=106432&rft_id=info:doi/10.1016%2Fj.infsof.2020.106432&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_infsof_2020_106432 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-5849&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-5849&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-5849&client=summon |