COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction

Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective ins...

Full description

Saved in:
Bibliographic Details
Published inInformation and software technology Vol. 129; p. 106432
Main Authors Feng, Shuo, Keung, Jacky, Yu, Xiao, Xiao, Yan, Bennin, Kwabena Ebo, Kabir, Md Alamgir, Zhang, Miao
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate either near-duplicated instances which result in overgeneralization (high probability of false alarm, pf) or overly diverse instances which hurt the prediction model’s ability to find defects (resulting in low probability of detection, pd). Furthermore, when existing oversampling techniques are applied in SDP, the effort needed to inspect the instances with different complexity is not taken into consideration. In this study, we introduce Complexity-based OverSampling TEchnique (COSTE), a novel oversampling technique that can achieve low pf and high pd simultaneously. Meanwhile, COSTE also performs better in terms of Norm(popt) and ACC, two effort-aware measures that consider the testing effort. COSTE combines pairs of defective instances with similar complexity to generate synthetic instances, which improves the diversity within the data, maintains the ability of prediction models to find defects, and takes the different testing effort needed for different instances into consideration. We conduct experiments to compare COSTE with Synthetic Minority Oversampling TEchnique, Borderline-SMOTE, Majority Weighted Minority Oversampling TEchnique and MAHAKIL. The experimental results on 23 releases of 10 projects show that COSTE greatly improves the diversity of the synthetic instances without compromising the ability of prediction models to find defects. In addition, COSTE outperforms the other oversampling techniques under the same testing effort. The statistical analysis indicates that COSTE’s ability to outperform the other oversampling techniques is significant under the statistical Wilcoxon rank sum test and Cliff’s effect size. COSTE is recommended as an efficient alternative to address the class imbalance problem in SDP.
AbstractList Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as the class imbalance problem. Oversampling techniques are frequently adopted to alleviate the problem by generating new synthetic defective instances. Existing techniques generate either near-duplicated instances which result in overgeneralization (high probability of false alarm, pf) or overly diverse instances which hurt the prediction model’s ability to find defects (resulting in low probability of detection, pd). Furthermore, when existing oversampling techniques are applied in SDP, the effort needed to inspect the instances with different complexity is not taken into consideration. In this study, we introduce Complexity-based OverSampling TEchnique (COSTE), a novel oversampling technique that can achieve low pf and high pd simultaneously. Meanwhile, COSTE also performs better in terms of Norm(popt) and ACC, two effort-aware measures that consider the testing effort. COSTE combines pairs of defective instances with similar complexity to generate synthetic instances, which improves the diversity within the data, maintains the ability of prediction models to find defects, and takes the different testing effort needed for different instances into consideration. We conduct experiments to compare COSTE with Synthetic Minority Oversampling TEchnique, Borderline-SMOTE, Majority Weighted Minority Oversampling TEchnique and MAHAKIL. The experimental results on 23 releases of 10 projects show that COSTE greatly improves the diversity of the synthetic instances without compromising the ability of prediction models to find defects. In addition, COSTE outperforms the other oversampling techniques under the same testing effort. The statistical analysis indicates that COSTE’s ability to outperform the other oversampling techniques is significant under the statistical Wilcoxon rank sum test and Cliff’s effect size. COSTE is recommended as an efficient alternative to address the class imbalance problem in SDP.
ArticleNumber 106432
Author Xiao, Yan
Bennin, Kwabena Ebo
Kabir, Md Alamgir
Zhang, Miao
Keung, Jacky
Feng, Shuo
Yu, Xiao
Author_xml – sequence: 1
  givenname: Shuo
  orcidid: 0000-0002-1575-9891
  surname: Feng
  fullname: Feng, Shuo
  email: shuofeng5-c@my.cityu.edu.hk
  organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
– sequence: 2
  givenname: Jacky
  orcidid: 0000-0002-3803-9600
  surname: Keung
  fullname: Keung, Jacky
  email: jacky.keung@cityu.edu.hk
  organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
– sequence: 3
  givenname: Xiao
  surname: Yu
  fullname: Yu, Xiao
  email: xyu224-c@my.cityu.edu.hk
  organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
– sequence: 4
  givenname: Yan
  surname: Xiao
  fullname: Xiao, Yan
  email: xiaoyan.hhu@gmail.com
  organization: School of Computing, National University of Singapore, 117417, Singapore
– sequence: 5
  givenname: Kwabena Ebo
  surname: Bennin
  fullname: Bennin, Kwabena Ebo
  email: kwabena.bennin@wur.nl
  organization: Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands
– sequence: 6
  givenname: Md Alamgir
  orcidid: 0000-0002-7136-6339
  surname: Kabir
  fullname: Kabir, Md Alamgir
  email: makabir4-c@my.cityu.edu.hk
  organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
– sequence: 7
  givenname: Miao
  surname: Zhang
  fullname: Zhang, Miao
  email: miazhang9-c@my.cityu.edu.hk
  organization: Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, China
BookMark eNqFkNtKAzEQhoNUsFbfwIu8wNZks8deCLLUAxR60XodssnEpuyhJmm1b2-W9coLhYFh_uEf_vmu0aTrO0DojpI5JTS7389Np12v5zGJBylLWHyBprTIWZSROJ2gKSlTEqVFUl6ha-f2hNCcMDJFx2q92S4XuOrbQwNfxp-jWjhQeH0CuxFBNN073i7lrjMfR8C-x6Jp4GSED8MOsGyEc9i0tWhEJwEfbF830GLT4ZDIfwoLWIEG6cMKlJHe9N0NutSicXD702fo7Wm5rV6i1fr5tXpcRZKRzEc5rZOShKCyzOuUKE0hLZmumcpCA1GHYrQoQBWxVnUsgekky9MwJ7mKNZuhZLwrbe-cBc0P1rTCnjklfEDH93xExwd0fEQXbItfNmm8GIJ7K0zzn_lhNEN47GTAcicNBDTK2ECBq978feAbwt-RXw
CitedBy_id crossref_primary_10_1109_ACCESS_2025_3532250
crossref_primary_10_1049_2023_6293074
crossref_primary_10_1109_TSE_2024_3492204
crossref_primary_10_1142_S0219649223500478
crossref_primary_10_1002_smr_2634
crossref_primary_10_4018_IJSI_309735
crossref_primary_10_1109_ACCESS_2023_3239266
crossref_primary_10_1007_s10489_024_05930_z
crossref_primary_10_1016_j_eswa_2024_125919
crossref_primary_10_1016_j_eswa_2023_122409
crossref_primary_10_1109_ACCESS_2023_3262604
crossref_primary_10_1016_j_asoc_2023_110952
crossref_primary_10_1016_j_eswa_2023_121039
crossref_primary_10_1049_sfw2_12099
crossref_primary_10_1016_j_scico_2024_103164
crossref_primary_10_3390_s21103314
crossref_primary_10_3233_JIFS_221902
crossref_primary_10_1007_s10489_025_06288_6
crossref_primary_10_1109_ACCESS_2024_3396155
crossref_primary_10_1007_s10586_024_04446_y
crossref_primary_10_1016_j_infsof_2021_106662
crossref_primary_10_7717_peerj_cs_2270
crossref_primary_10_1016_j_infsof_2021_106588
crossref_primary_10_1016_j_infsof_2021_106742
crossref_primary_10_1016_j_infsof_2022_107016
crossref_primary_10_1049_2024_5550801
crossref_primary_10_32604_cmc_2024_057538
crossref_primary_10_1016_j_knosys_2024_111835
crossref_primary_10_1109_TR_2023_3295012
crossref_primary_10_1111_exsy_12977
crossref_primary_10_1016_j_ins_2022_07_130
crossref_primary_10_1016_j_infsof_2021_106747
crossref_primary_10_1007_s10462_022_10371_6
crossref_primary_10_1109_ACCESS_2022_3211401
crossref_primary_10_1109_TR_2022_3158949
crossref_primary_10_1145_3699602
crossref_primary_10_1142_S0219467824500451
crossref_primary_10_1109_TR_2024_3393734
crossref_primary_10_1080_1206212X_2023_2252117
crossref_primary_10_3390_app131810466
crossref_primary_10_1007_s11219_023_09615_7
crossref_primary_10_1007_s11334_024_00571_4
crossref_primary_10_1002_smr_2731
crossref_primary_10_1007_s40747_022_00676_y
crossref_primary_10_1186_s40537_023_00715_6
crossref_primary_10_1109_ACCESS_2025_3550583
crossref_primary_10_1109_ACCESS_2022_3211978
crossref_primary_10_1016_j_jjimei_2022_100153
crossref_primary_10_1016_j_infsof_2023_107250
crossref_primary_10_1016_j_eswa_2023_121251
crossref_primary_10_1016_j_jss_2023_111858
crossref_primary_10_1007_s00500_024_09881_y
crossref_primary_10_1016_j_infsof_2022_106985
crossref_primary_10_1016_j_eswa_2023_123041
crossref_primary_10_1007_s10664_022_10186_7
crossref_primary_10_1007_s11334_025_00601_9
crossref_primary_10_4018_IJSSCI_301268
crossref_primary_10_1109_TR_2023_3272651
crossref_primary_10_1155_acis_1013769
crossref_primary_10_1007_s11219_023_09640_6
crossref_primary_10_1007_s11334_021_00399_2
crossref_primary_10_1016_j_jss_2024_112131
crossref_primary_10_1007_s11227_024_06312_5
crossref_primary_10_1016_j_neucom_2024_128538
crossref_primary_10_1016_j_rineng_2025_104123
crossref_primary_10_3233_IDA_226612
crossref_primary_10_3390_sym14122508
crossref_primary_10_1002_spe_3316
crossref_primary_10_1007_s13369_024_08740_0
crossref_primary_10_1016_j_asoc_2022_109069
Cites_doi 10.1109/TIT.1967.1053964
10.1109/IRI.2012.6303039
10.1007/s10515-019-00259-1
10.1007/s00521-007-0089-7
10.1109/TKDE.2012.232
10.1016/j.infsof.2020.106287
10.1016/j.infsof.2007.02.015
10.1016/j.engappai.2017.10.019
10.1016/j.infsof.2019.07.003
10.1016/j.infsof.2011.09.007
10.1007/s10664-018-9633-6
10.1007/s13748-016-0094-0
10.1109/TSE.2007.256941
10.3233/IDA-2002-6504
10.1007/s10664-008-9079-3
10.1007/s00500-018-3093-1
10.1109/ACCESS.2018.2817572
10.1109/TSE.2007.70721
10.1016/j.ins.2016.09.041
10.1109/TSE.2016.2584050
10.1023/A:1008202821328
10.1109/ICMLA.2010.27
10.1109/ASE.2015.56
10.1016/j.bmcl.2005.01.061
10.1613/jair.953
10.1007/s10586-017-0892-6
10.1109/TSE.2005.49
10.1007/s10664-012-9218-8
10.1109/TSE.2012.70
10.1016/j.knosys.2012.12.007
10.1016/j.infsof.2014.12.006
10.1109/TSE.2018.2877678
10.1007/s10664-008-9103-7
10.1007/s11219-016-9342-6
10.1109/TSE.1984.5010196
10.1007/s10515-017-0220-7
10.1016/j.infsof.2014.07.005
10.1016/S0031-3203(96)00142-2
10.1016/j.asoc.2020.106089
10.1016/j.enbuild.2016.05.028
10.1109/TSMCC.2012.2226152
10.1016/j.eswa.2016.06.005
10.1109/TSE.2016.2543218
10.1016/j.infsof.2018.04.001
10.1109/TSE.2017.2731766
10.1016/j.infsof.2014.11.006
10.1016/S0169-7439(99)00047-7
10.1109/TKDE.2017.2779849
10.1145/2950290.2950353
10.1007/s10489-014-0610-5
10.1109/TR.2014.2316951
10.1109/TSE.2016.2599161
10.1016/j.infsof.2017.07.004
ContentType Journal Article
Copyright 2020 Elsevier B.V.
Copyright_xml – notice: 2020 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.infsof.2020.106432
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Business
EISSN 1873-6025
ExternalDocumentID 10_1016_j_infsof_2020_106432
S0950584920301889
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
7-5
71M
77K
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
AAYOK
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACGOD
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
AEBSH
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BKOMP
BLXMC
CS3
DU5
EBS
EFJIC
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
KOM
LG9
M41
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSZ
T5K
TWZ
UHS
UNMZH
WH7
WUQ
XFK
ZY4
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AFXIZ
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKYEP
ANKPU
APXCP
BNPGV
CITATION
SSH
ID FETCH-LOGICAL-c306t-71b490703c97b50df1e593fb3d693feabeab3188ed82fdb2ce3f4675ed847d2f3
IEDL.DBID .~1
ISSN 0950-5849
IngestDate Thu Apr 24 23:11:01 EDT 2025
Tue Jul 01 02:22:04 EDT 2025
Mon Apr 08 05:17:07 EDT 2024
IsPeerReviewed true
IsScholarly true
Keywords MAHAKIL
Effort-aware defect prediction
SMOTE
Oversampling
Software defect prediction
Class imbalance
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c306t-71b490703c97b50df1e593fb3d693feabeab3188ed82fdb2ce3f4675ed847d2f3
ORCID 0000-0002-1575-9891
0000-0002-3803-9600
0000-0002-7136-6339
ParticipantIDs crossref_primary_10_1016_j_infsof_2020_106432
crossref_citationtrail_10_1016_j_infsof_2020_106432
elsevier_sciencedirect_doi_10_1016_j_infsof_2020_106432
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate January 2021
2021-01-00
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: January 2021
PublicationDecade 2020
PublicationTitle Information and software technology
PublicationYear 2021
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References He, Bai, Garcia, Li (b24) 2008
Huda, Miah, Hassan, Islam, Yearwood, Alrubaian, Almogren (b79) 2017; 379
Maciejewski, Stefanowski (b64) 2011
Zhang, Song, Wang, Zhang, He, Jia (b19) 2015; 42
Li, He, Zhu, Lyu (b78) 2017
Ostrand, Weyuker, Bell (b8) 2005; 31
Japkowicz, Stephen (b21) 2002; 6
Lin, Hsieh, Liu, Lin, Fang, Wang, Yen, Pal, Chuang (b56) 2017; 30
Zhang, Hassan, McIntosh, Zou (b2) 2016; 43
Bradley (b66) 1997; 30
Menzies, Dekhtyar, Distefano, Greenwald (b74) 2007; 33
Gong, Jiang, Wang, Jiang (b58) 2019
Nagappan, Ball, Zeller (b7) 2006
Yu, Wu, Jian, Bennin, Fu, Ma (b50) 2018; 22
Zhou (b20) 2013; 41
Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg (b72) 2011; 12
Buckley, Poston (b1) 1984; SE-10
Krawczyk (b13) 2016; 5
Drummond, Holte (b52) 2003
Jiang, Cukic, Ma (b63) 2008; 13
Sharmeen, Huda, Abawajy, Hassan (b71) 2020; 89
Guo, Yin, Dong, Yang, Zhou (b53) 2008
Zhang, Deb, Lee, Yang, Shah (b39) 2016; 126
Turhan, Menzies, Bener, Di Stefano (b28) 2009; 14
Xia, Lo, Pan, Nagappan, Wang (b62) 2016; 42
Bennin, Keung, Phannachitta, Monden, Mensah (b12) 2018; 44
Chen, Fang, Shang, Tang (b18) 2018; 26
J. Nam, S. Kim, CLAMI: Defect prediction on unlabeled datasets (T), in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 452–463
Tax (b25) 2002
Han, Wang, Mao (b23) 2005
Sun, Song, Zhu (b44) 2012; 42
Tantithamthavorn, McIntosh, Hassan, Matsumoto (b75) 2016; 43
Biswas, Suganthan, Mallipeddi, Amaratunga (b38) 2018; 68
Öztürk (b77) 2017; 92
Zhou, Sun, Xia, Li, Chen (b59) 2019; 114
Wong, Leung, Ling (b27) 2013
Kamei, Shihab, Adams, Hassan, Mockus, Sinha, Ubayashi (b31) 2012; 39
Li, Jing, Wu, Zhu, Xu, Ying (b5) 2018; 25
Mende, Koschke (b32) 2010
.
Xia, Lo, Shihab, Wang, Yang (b46) 2015; 61
A.A. Shanab, T.M. Khoshgoftaar, R. Wald, A. Napolitano, Impact of noise and data sampling on stability of feature ranking techniques for biological datasets, in: 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), 2012, pp. 415–422
Agrawal, Menzies (b30) 2018
Chawla, Bowyer, Hall, Kegelmeyer (b22) 2002; 16
Li, Jing, Zhu, Zhang, Xu, Ying (b61) 2019; 26
Xing, Guo, Lyu (b70) 2005
Okutan, Yıldız (b11) 2014; 19
H. Wang, T.M. Khoshgoftaar, A. Napolitano, A comparative study of ensemble feature selection techniques for software defect prediction, in: 2010 Ninth International Conference on Machine Learning and Applications, 2010, pp. 135–140
Onan, Korukoğlu, Bulut (b40) 2016; 62
Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu, L. Xu, B. Xu, H. Leung, Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 157–168.
Weiss, Provost (b14) 2001
Shirabad, Menzies (b37) 2005
Cover, Hart (b68) 1967; 13
Liu, Miao, Zhang (b48) 2014; 63
Ma, Guo, Cukic (b69) 2007
Fan, Diao, Yu, Yang, Chen (b76) 2019; 2019
Huda, Liu, Abdelrazek, Ibrahim, Alyahya, Al-Dossari, Ahmad (b55) 2018; 6
He, Li, Liu, Chen, Ma (b65) 2015; 59
Zhang, Zheng, Zou, Hassan (b34) 2016
Ali, Huda, Abawajy, Alyahya, Al-Dossari, Yearwood (b6) 2017; 20
Wan, Xia, Hassan, Lo, Yin, Yang (b4) 2018
Saçar, Allmer (b42) 2013
Jing, Ying, Zhang, Wu, Liu (b49) 2014
Tomar, Agarwal (b51) 2016; 2016
Bennin, Keung, Monden (b17) 2017
Kampenes, Dybå, Hannay, Sjøberg (b67) 2007; 49
De Maesschalck, Jouan-Rimbaud, Massart (b26) 2000; 50
Storn, Price (b57) 1997; 11
Menzies, Greenwald, Frank (b36) 2006; 33
Limsettho, Bennin, Keung, Hata, Matsumoto (b3) 2018; 100
Tomaszewski, Grahn, Lundberg (b9) 2006
Yoon, Kwek (b15) 2007; 16
Zhang, Li, Zhao, Wang, Pan, Tanaka, Kadota (b41) 2005; 15
Ma, Luo, Zeng, Chen (b10) 2012; 54
Barua, Islam, Yao, Murase (b54) 2012; 26
Provost (b43) 2000
Bennin, Keung, Monden (b16) 2019; 24
Li, Shepperd, Guo (b60) 2020
Laradji, Alshayeb, Ghouti (b45) 2015; 58
Bennin, Keung, Monden, Phannachitta, Mensah (b29) 2017
Cover (10.1016/j.infsof.2020.106432_b68) 1967; 13
Ma (10.1016/j.infsof.2020.106432_b69) 2007
Tomaszewski (10.1016/j.infsof.2020.106432_b9) 2006
Menzies (10.1016/j.infsof.2020.106432_b74) 2007; 33
Buckley (10.1016/j.infsof.2020.106432_b1) 1984; SE-10
Huda (10.1016/j.infsof.2020.106432_b55) 2018; 6
Sharmeen (10.1016/j.infsof.2020.106432_b71) 2020; 89
Onan (10.1016/j.infsof.2020.106432_b40) 2016; 62
Lin (10.1016/j.infsof.2020.106432_b56) 2017; 30
Limsettho (10.1016/j.infsof.2020.106432_b3) 2018; 100
Fan (10.1016/j.infsof.2020.106432_b76) 2019; 2019
Yoon (10.1016/j.infsof.2020.106432_b15) 2007; 16
Krawczyk (10.1016/j.infsof.2020.106432_b13) 2016; 5
Kamei (10.1016/j.infsof.2020.106432_b31) 2012; 39
Zhou (10.1016/j.infsof.2020.106432_b20) 2013; 41
Agrawal (10.1016/j.infsof.2020.106432_b30) 2018
Bennin (10.1016/j.infsof.2020.106432_b29) 2017
Laradji (10.1016/j.infsof.2020.106432_b45) 2015; 58
De Maesschalck (10.1016/j.infsof.2020.106432_b26) 2000; 50
Wong (10.1016/j.infsof.2020.106432_b27) 2013
Kampenes (10.1016/j.infsof.2020.106432_b67) 2007; 49
Bradley (10.1016/j.infsof.2020.106432_b66) 1997; 30
Pedregosa (10.1016/j.infsof.2020.106432_b72) 2011; 12
Huda (10.1016/j.infsof.2020.106432_b79) 2017; 379
Ma (10.1016/j.infsof.2020.106432_b10) 2012; 54
Zhou (10.1016/j.infsof.2020.106432_b59) 2019; 114
Ostrand (10.1016/j.infsof.2020.106432_b8) 2005; 31
Tantithamthavorn (10.1016/j.infsof.2020.106432_b75) 2016; 43
Shirabad (10.1016/j.infsof.2020.106432_b37) 2005
Öztürk (10.1016/j.infsof.2020.106432_b77) 2017; 92
Biswas (10.1016/j.infsof.2020.106432_b38) 2018; 68
Gong (10.1016/j.infsof.2020.106432_b58) 2019
Guo (10.1016/j.infsof.2020.106432_b53) 2008
Zhang (10.1016/j.infsof.2020.106432_b39) 2016; 126
Jiang (10.1016/j.infsof.2020.106432_b63) 2008; 13
Xia (10.1016/j.infsof.2020.106432_b46) 2015; 61
Turhan (10.1016/j.infsof.2020.106432_b28) 2009; 14
Tax (10.1016/j.infsof.2020.106432_b25) 2002
Li (10.1016/j.infsof.2020.106432_b5) 2018; 25
Ali (10.1016/j.infsof.2020.106432_b6) 2017; 20
Zhang (10.1016/j.infsof.2020.106432_b19) 2015; 42
10.1016/j.infsof.2020.106432_b47
Sun (10.1016/j.infsof.2020.106432_b44) 2012; 42
Japkowicz (10.1016/j.infsof.2020.106432_b21) 2002; 6
Xing (10.1016/j.infsof.2020.106432_b70) 2005
Yu (10.1016/j.infsof.2020.106432_b50) 2018; 22
Han (10.1016/j.infsof.2020.106432_b23) 2005
Chen (10.1016/j.infsof.2020.106432_b18) 2018; 26
Mende (10.1016/j.infsof.2020.106432_b32) 2010
Saçar (10.1016/j.infsof.2020.106432_b42) 2013
Provost (10.1016/j.infsof.2020.106432_b43) 2000
Nagappan (10.1016/j.infsof.2020.106432_b7) 2006
Liu (10.1016/j.infsof.2020.106432_b48) 2014; 63
Li (10.1016/j.infsof.2020.106432_b60) 2020
Xia (10.1016/j.infsof.2020.106432_b62) 2016; 42
Okutan (10.1016/j.infsof.2020.106432_b11) 2014; 19
Zhang (10.1016/j.infsof.2020.106432_b34) 2016
Zhang (10.1016/j.infsof.2020.106432_b41) 2005; 15
10.1016/j.infsof.2020.106432_b33
Drummond (10.1016/j.infsof.2020.106432_b52) 2003
10.1016/j.infsof.2020.106432_b35
Li (10.1016/j.infsof.2020.106432_b61) 2019; 26
10.1016/j.infsof.2020.106432_b73
Zhang (10.1016/j.infsof.2020.106432_b2) 2016; 43
Weiss (10.1016/j.infsof.2020.106432_b14) 2001
Storn (10.1016/j.infsof.2020.106432_b57) 1997; 11
Bennin (10.1016/j.infsof.2020.106432_b17) 2017
Maciejewski (10.1016/j.infsof.2020.106432_b64) 2011
Wan (10.1016/j.infsof.2020.106432_b4) 2018
Tomar (10.1016/j.infsof.2020.106432_b51) 2016; 2016
Bennin (10.1016/j.infsof.2020.106432_b12) 2018; 44
Menzies (10.1016/j.infsof.2020.106432_b36) 2006; 33
Li (10.1016/j.infsof.2020.106432_b78) 2017
Jing (10.1016/j.infsof.2020.106432_b49) 2014
Bennin (10.1016/j.infsof.2020.106432_b16) 2019; 24
Chawla (10.1016/j.infsof.2020.106432_b22) 2002; 16
He (10.1016/j.infsof.2020.106432_b24) 2008
He (10.1016/j.infsof.2020.106432_b65) 2015; 59
Barua (10.1016/j.infsof.2020.106432_b54) 2012; 26
References_xml – volume: 12
  start-page: 2825
  year: 2011
  end-page: 2830
  ident: b72
  article-title: Scikit-learn: Machine learning in python
  publication-title: J. Mach. Learn. Res.
– year: 2018
  ident: b4
  article-title: Perceptions, expectations, and challenges in defect prediction
  publication-title: IEEE Trans. Softw. Eng.
– volume: 26
  start-page: 599
  year: 2019
  end-page: 651
  ident: b61
  article-title: Heterogeneous defect prediction with two-stage ensemble learning
  publication-title: Autom. Softw. Eng.
– volume: 49
  start-page: 1073
  year: 2007
  end-page: 1086
  ident: b67
  article-title: A systematic review of effect size in software engineering experiments
  publication-title: Inf. Softw. Technol.
– start-page: 237
  year: 2007
  end-page: 263
  ident: b69
  article-title: A statistical framework for the prediction of fault-proneness
  publication-title: Advances in Machine Learning Applications in Software Engineering
– volume: 42
  start-page: 544
  year: 2015
  end-page: 565
  ident: b19
  article-title: A dissimilarity-based imbalance data classification algorithm
  publication-title: Appl. Intell.
– volume: 41
  start-page: 16
  year: 2013
  end-page: 25
  ident: b20
  article-title: Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods
  publication-title: Knowl.-Based Syst.
– volume: 54
  start-page: 248
  year: 2012
  end-page: 256
  ident: b10
  article-title: Transfer learning for cross-company software defect prediction
  publication-title: Inf. Softw. Technol.
– year: 2002
  ident: b25
  article-title: One-class classification: Concept learning in the absence of counter-examples
– start-page: 630
  year: 2017
  end-page: 635
  ident: b17
  article-title: Impact of the distribution parameter of data sampling approaches on software defect prediction models
  publication-title: 2017 24th Asia-Pacific Software Engineering Conference (APSEC)
– volume: 43
  start-page: 1
  year: 2016
  end-page: 18
  ident: b75
  article-title: An empirical comparison of model validation techniques for defect prediction models
  publication-title: IEEE Trans. Softw. Eng.
– year: 2005
  ident: b37
  article-title: The PROMISE Repository of Software Engineering Databases, vol. 24
– start-page: 318
  year: 2017
  end-page: 328
  ident: b78
  article-title: Software defect prediction via convolutional neural network
  publication-title: 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS)
– start-page: 192
  year: 2008
  end-page: 201
  ident: b53
  article-title: On the class imbalance problem
  publication-title: 2008 Fourth International Conference on Natural Computation, Vol. 4
– volume: 11
  start-page: 341
  year: 1997
  end-page: 359
  ident: b57
  article-title: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces
  publication-title: J. Global Optim.
– reference: A.A. Shanab, T.M. Khoshgoftaar, R. Wald, A. Napolitano, Impact of noise and data sampling on stability of feature ranking techniques for biological datasets, in: 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), 2012, pp. 415–422,
– start-page: 487
  year: 2006
  end-page: 496
  ident: b9
  article-title: A method for an accurate early prediction of faults in modified classes
  publication-title: 2006 22nd IEEE International Conference on Software Maintenance
– volume: 22
  start-page: 3461
  year: 2018
  end-page: 3472
  ident: b50
  article-title: Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning
  publication-title: Soft Comput.
– start-page: 1
  year: 2013
  end-page: 6
  ident: b42
  article-title: Data mining for microrna gene prediction: on the impact of class imbalance and feature number for microrna gene prediction
  publication-title: 2013 8th International Symposium on Health Informatics and Bioinformatics
– year: 2020
  ident: b60
  article-title: A systematic review of unsupervised learning techniques for software defect prediction
  publication-title: Inf. Softw. Technol.
– volume: 42
  start-page: 1806
  year: 2012
  end-page: 1817
  ident: b44
  article-title: Using coding-based ensemble learning to improve software defect prediction
  publication-title: IEEE Trans. Syst. Man Cybern. B
– volume: 30
  start-page: 950
  year: 2017
  end-page: 962
  ident: b56
  article-title: Minority oversampling in kernel adaptive subspaces for class imbalanced datasets
  publication-title: IEEE Trans. Knowl. Data Eng.
– volume: 62
  start-page: 1
  year: 2016
  end-page: 16
  ident: b40
  article-title: A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification
  publication-title: Expert Syst. Appl.
– volume: SE-10
  start-page: 36
  year: 1984
  end-page: 41
  ident: b1
  article-title: Software quality assurance
  publication-title: IEEE Trans. Softw. Eng.
– volume: 31
  start-page: 340
  year: 2005
  end-page: 355
  ident: b8
  article-title: Predicting the location and number of faults in large software systems
  publication-title: IEEE Trans. Softw. Eng.
– volume: 126
  start-page: 94
  year: 2016
  end-page: 103
  ident: b39
  article-title: Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique
  publication-title: Energy Build.
– start-page: 1050
  year: 2018
  end-page: 1061
  ident: b30
  article-title: Is “better data” better than “better data miners”?
  publication-title: 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)
– volume: 24
  start-page: 602
  year: 2019
  end-page: 636
  ident: b16
  article-title: On the relative value of data resampling approaches for software defect prediction
  publication-title: Empir. Softw. Eng.
– volume: 100
  start-page: 87
  year: 2018
  end-page: 102
  ident: b3
  article-title: Cross project defect prediction using class distribution estimation and oversampling
  publication-title: Inf. Softw. Technol.
– volume: 26
  start-page: 97
  year: 2018
  end-page: 125
  ident: b18
  article-title: Tackling class overlap and imbalance problems in software defect prediction
  publication-title: Softw. Qual. J.
– reference: H. Wang, T.M. Khoshgoftaar, A. Napolitano, A comparative study of ensemble feature selection techniques for software defect prediction, in: 2010 Ninth International Conference on Machine Learning and Applications, 2010, pp. 135–140,
– volume: 5
  start-page: 221
  year: 2016
  end-page: 232
  ident: b13
  article-title: Learning from imbalanced data: open challenges and future directions
  publication-title: Prog. Artif. Intell.
– year: 2001
  ident: b14
  article-title: The effect of class distribution on classifier learning: an empirical study
– volume: 13
  start-page: 21
  year: 1967
  end-page: 27
  ident: b68
  article-title: Nearest neighbor pattern classification
  publication-title: IEEE Trans. Inform. Theory
– volume: 42
  start-page: 977
  year: 2016
  end-page: 998
  ident: b62
  article-title: Hydra: Massively compositional model for cross-project defect prediction
  publication-title: IEEE Trans. Softw. Eng.
– start-page: 1
  year: 2000
  end-page: 3
  ident: b43
  article-title: Machine learning from imbalanced data sets 101
  publication-title: Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, vol. 68
– volume: 19
  start-page: 154
  year: 2014
  end-page: 181
  ident: b11
  article-title: Software defect prediction using Bayesian networks
  publication-title: Empir. Softw. Eng.
– volume: 50
  start-page: 1
  year: 2000
  end-page: 18
  ident: b26
  article-title: The mahalanobis distance
  publication-title: Chemometr. Intell. Lab. Syst.
– volume: 6
  start-page: 24184
  year: 2018
  end-page: 24195
  ident: b55
  article-title: An ensemble oversampling model for class imbalance problem in software defect prediction
  publication-title: IEEE Access
– volume: 16
  start-page: 321
  year: 2002
  end-page: 357
  ident: b22
  article-title: SMOTE: synthetic minority over-sampling technique
  publication-title: J. Artificial Intelligence Res.
– volume: 114
  start-page: 204
  year: 2019
  end-page: 216
  ident: b59
  article-title: Improving defect prediction with deep forest
  publication-title: Inf. Softw. Technol.
– volume: 59
  start-page: 170
  year: 2015
  end-page: 190
  ident: b65
  article-title: An empirical study on software defect prediction with a simplified metric set
  publication-title: Inf. Softw. Technol.
– start-page: 2354
  year: 2013
  end-page: 2359
  ident: b27
  article-title: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets
  publication-title: Iecon 2013-39th Annual Conference of the Ieee Industrial Electronics Society
– volume: 63
  start-page: 676
  year: 2014
  end-page: 686
  ident: b48
  article-title: Two-stage cost-sensitive learning for software defect prediction
  publication-title: IEEE Trans. Reliab.
– reference: J. Nam, S. Kim, CLAMI: Defect prediction on unlabeled datasets (T), in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015, pp. 452–463,
– reference: Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu, L. Xu, B. Xu, H. Leung, Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models, in: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016, pp. 157–168.
– volume: 58
  start-page: 388
  year: 2015
  end-page: 402
  ident: b45
  article-title: Software defect prediction using ensemble learning on selected features
  publication-title: Inf. Softw. Technol.
– volume: 92
  start-page: 17
  year: 2017
  end-page: 29
  ident: b77
  article-title: Which type of metrics are useful to deal with class imbalance in software defect prediction?
  publication-title: Inf. Softw. Technol.
– start-page: 309
  year: 2016
  end-page: 320
  ident: b34
  article-title: Cross-project defect prediction using a connectivity-based unsupervised classifier
  publication-title: Proceedings of the 38th International Conference on Software Engineering
– year: 2005
  ident: b70
  article-title: A novel method for early software quality prediction based on support vector machine
  publication-title: 16th IEEE International Symposium on Software Reliability Engineering (ISSRE’05)
– volume: 25
  start-page: 201
  year: 2018
  end-page: 245
  ident: b5
  article-title: Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction
  publication-title: Autom. Softw. Eng.
– volume: 13
  start-page: 561
  year: 2008
  end-page: 595
  ident: b63
  article-title: Techniques for evaluating fault prediction models
  publication-title: Empir. Softw. Eng.
– start-page: 698
  year: 2019
  end-page: 709
  ident: b58
  article-title: Empirical evaluation of the impact of class overlap on software defect prediction
  publication-title: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)
– start-page: 1322
  year: 2008
  end-page: 1328
  ident: b24
  article-title: ADASYN: Adaptive synthetic sampling approach for imbalanced learning
  publication-title: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)
– start-page: 107
  year: 2010
  end-page: 116
  ident: b32
  article-title: Effort-aware defect prediction models
  publication-title: 2010 14th European Conference on Software Maintenance and Reengineering
– volume: 2019
  year: 2019
  ident: b76
  article-title: Software defect prediction via attention-based recurrent neural network
  publication-title: Sci. Program.
– volume: 44
  start-page: 534
  year: 2018
  end-page: 550
  ident: b12
  article-title: MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction
  publication-title: IEEE Trans. Softw. Eng.
– volume: 26
  start-page: 405
  year: 2012
  end-page: 425
  ident: b54
  article-title: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning
  publication-title: IEEE Trans. Knowl. Data Eng.
– volume: 20
  start-page: 2267
  year: 2017
  end-page: 2281
  ident: b6
  article-title: A parallel framework for software defect detection and metric selection on cloud computing
  publication-title: Cluster Comput.
– volume: 379
  start-page: 211
  year: 2017
  end-page: 228
  ident: b79
  article-title: Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data
  publication-title: Inform. Sci.
– volume: 2016
  start-page: 6:6
  year: 2016
  ident: b51
  article-title: Prediction of defective software modules using class imbalance learning
  publication-title: Appl. Comp. Intell. Soft Comput.
– volume: 30
  start-page: 1145
  year: 1997
  end-page: 1159
  ident: b66
  article-title: The use of the area under the ROC curve in the evaluation of machine learning algorithms
  publication-title: Pattern Recognit.
– volume: 89
  year: 2020
  ident: b71
  article-title: An adaptive framework against android privilege escalation threats using deep learning and semi-supervised approaches
  publication-title: Appl. Soft Comput.
– volume: 43
  start-page: 476
  year: 2016
  end-page: 491
  ident: b2
  article-title: The use of summation to aggregate software metrics hinders the performance of defect prediction models
  publication-title: IEEE Trans. Softw. Eng.
– volume: 16
  start-page: 295
  year: 2007
  end-page: 306
  ident: b15
  article-title: A data reduction approach for resolving the imbalanced data issue in functional genomics
  publication-title: Neural Comput. Appl.
– volume: 61
  start-page: 93
  year: 2015
  end-page: 106
  ident: b46
  article-title: ELBlocker: Predicting blocking bugs with ensemble imbalance learning
  publication-title: Inf. Softw. Technol.
– volume: 14
  start-page: 540
  year: 2009
  end-page: 578
  ident: b28
  article-title: On the relative value of cross-company and within-company data for defect prediction
  publication-title: Empir. Softw. Eng.
– volume: 6
  start-page: 429
  year: 2002
  end-page: 449
  ident: b21
  article-title: The class imbalance problem: A systematic study
  publication-title: Intell. Data Anal.
– reference: .
– start-page: 1
  year: 2003
  end-page: 8
  ident: b52
  article-title: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling
  publication-title: Workshop on Learning from Imbalanced Datasets II, vol. 11
– start-page: 452
  year: 2006
  end-page: 461
  ident: b7
  article-title: Mining metrics to predict component failures
  publication-title: Proceedings of the 28th International Conference on Software Engineering
– start-page: 364
  year: 2017
  end-page: 373
  ident: b29
  article-title: The significant effects of data sampling approaches on software defect prioritization and classification
  publication-title: 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)
– volume: 39
  start-page: 757
  year: 2012
  end-page: 773
  ident: b31
  article-title: A large-scale empirical study of just-in-time quality assurance
  publication-title: IEEE Trans. Softw. Eng.
– volume: 68
  start-page: 81
  year: 2018
  end-page: 100
  ident: b38
  article-title: Optimal power flow solutions using differential evolution algorithm integrated with effective constraint handling techniques
  publication-title: Eng. Appl. Artif. Intell.
– start-page: 878
  year: 2005
  end-page: 887
  ident: b23
  article-title: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
  publication-title: International Conference on Intelligent Computing
– volume: 15
  start-page: 1629
  year: 2005
  end-page: 1632
  ident: b41
  article-title: Synthesis and activity of oleanolic acid derivatives, a novel class of inhibitors of osteoclast formation
  publication-title: Bioorganic Med. Chem. Lett.
– volume: 33
  start-page: 637
  year: 2007
  end-page: 640
  ident: b74
  article-title: Problems with precision: A response to “comments on’data mining static code attributes to learn defect predictors’”
  publication-title: IEEE Trans. Softw. Eng.
– start-page: 414
  year: 2014
  end-page: 423
  ident: b49
  article-title: Dictionary learning based software defect prediction
  publication-title: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014
– start-page: 104
  year: 2011
  end-page: 111
  ident: b64
  article-title: Local neighbourhood extension of SMOTE for mining imbalanced data
  publication-title: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)
– volume: 33
  start-page: 2
  year: 2006
  end-page: 13
  ident: b36
  article-title: Data mining static code attributes to learn defect predictors
  publication-title: IEEE Trans. Softw. Eng.
– volume: 2019
  year: 2019
  ident: 10.1016/j.infsof.2020.106432_b76
  article-title: Software defect prediction via attention-based recurrent neural network
  publication-title: Sci. Program.
– volume: 13
  start-page: 21
  issue: 1
  year: 1967
  ident: 10.1016/j.infsof.2020.106432_b68
  article-title: Nearest neighbor pattern classification
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.1967.1053964
– ident: 10.1016/j.infsof.2020.106432_b73
  doi: 10.1109/IRI.2012.6303039
– volume: 26
  start-page: 599
  issue: 3
  year: 2019
  ident: 10.1016/j.infsof.2020.106432_b61
  article-title: Heterogeneous defect prediction with two-stage ensemble learning
  publication-title: Autom. Softw. Eng.
  doi: 10.1007/s10515-019-00259-1
– volume: 16
  start-page: 295
  issue: 3
  year: 2007
  ident: 10.1016/j.infsof.2020.106432_b15
  article-title: A data reduction approach for resolving the imbalanced data issue in functional genomics
  publication-title: Neural Comput. Appl.
  doi: 10.1007/s00521-007-0089-7
– volume: 26
  start-page: 405
  issue: 2
  year: 2012
  ident: 10.1016/j.infsof.2020.106432_b54
  article-title: MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2012.232
– start-page: 698
  year: 2019
  ident: 10.1016/j.infsof.2020.106432_b58
  article-title: Empirical evaluation of the impact of class overlap on software defect prediction
– year: 2020
  ident: 10.1016/j.infsof.2020.106432_b60
  article-title: A systematic review of unsupervised learning techniques for software defect prediction
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2020.106287
– volume: 49
  start-page: 1073
  issue: 11–12
  year: 2007
  ident: 10.1016/j.infsof.2020.106432_b67
  article-title: A systematic review of effect size in software engineering experiments
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2007.02.015
– volume: 68
  start-page: 81
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b38
  article-title: Optimal power flow solutions using differential evolution algorithm integrated with effective constraint handling techniques
  publication-title: Eng. Appl. Artif. Intell.
  doi: 10.1016/j.engappai.2017.10.019
– volume: 114
  start-page: 204
  year: 2019
  ident: 10.1016/j.infsof.2020.106432_b59
  article-title: Improving defect prediction with deep forest
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2019.07.003
– volume: 54
  start-page: 248
  issue: 3
  year: 2012
  ident: 10.1016/j.infsof.2020.106432_b10
  article-title: Transfer learning for cross-company software defect prediction
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2011.09.007
– volume: 24
  start-page: 602
  issue: 2
  year: 2019
  ident: 10.1016/j.infsof.2020.106432_b16
  article-title: On the relative value of data resampling approaches for software defect prediction
  publication-title: Empir. Softw. Eng.
  doi: 10.1007/s10664-018-9633-6
– volume: 5
  start-page: 221
  issue: 4
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b13
  article-title: Learning from imbalanced data: open challenges and future directions
  publication-title: Prog. Artif. Intell.
  doi: 10.1007/s13748-016-0094-0
– start-page: 309
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b34
  article-title: Cross-project defect prediction using a connectivity-based unsupervised classifier
– volume: 33
  start-page: 2
  issue: 1
  year: 2006
  ident: 10.1016/j.infsof.2020.106432_b36
  article-title: Data mining static code attributes to learn defect predictors
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2007.256941
– volume: 6
  start-page: 429
  issue: 5
  year: 2002
  ident: 10.1016/j.infsof.2020.106432_b21
  article-title: The class imbalance problem: A systematic study
  publication-title: Intell. Data Anal.
  doi: 10.3233/IDA-2002-6504
– year: 2005
  ident: 10.1016/j.infsof.2020.106432_b37
– start-page: 414
  year: 2014
  ident: 10.1016/j.infsof.2020.106432_b49
  article-title: Dictionary learning based software defect prediction
– volume: 13
  start-page: 561
  issue: 5
  year: 2008
  ident: 10.1016/j.infsof.2020.106432_b63
  article-title: Techniques for evaluating fault prediction models
  publication-title: Empir. Softw. Eng.
  doi: 10.1007/s10664-008-9079-3
– start-page: 1
  year: 2003
  ident: 10.1016/j.infsof.2020.106432_b52
  article-title: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling
– volume: 22
  start-page: 3461
  issue: 10
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b50
  article-title: Cross-company defect prediction via semi-supervised clustering-based data filtering and MSTrA-based transfer learning
  publication-title: Soft Comput.
  doi: 10.1007/s00500-018-3093-1
– volume: 6
  start-page: 24184
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b55
  article-title: An ensemble oversampling model for class imbalance problem in software defect prediction
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2817572
– start-page: 104
  year: 2011
  ident: 10.1016/j.infsof.2020.106432_b64
  article-title: Local neighbourhood extension of SMOTE for mining imbalanced data
– volume: 33
  start-page: 637
  issue: 9
  year: 2007
  ident: 10.1016/j.infsof.2020.106432_b74
  article-title: Problems with precision: A response to “comments on’data mining static code attributes to learn defect predictors’”
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2007.70721
– start-page: 364
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b29
  article-title: The significant effects of data sampling approaches on software defect prioritization and classification
– start-page: 452
  year: 2006
  ident: 10.1016/j.infsof.2020.106432_b7
  article-title: Mining metrics to predict component failures
– start-page: 2354
  year: 2013
  ident: 10.1016/j.infsof.2020.106432_b27
  article-title: A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets
– volume: 379
  start-page: 211
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b79
  article-title: Defending unknown attacks on cyber-physical systems by semi-supervised approach and available unlabeled data
  publication-title: Inform. Sci.
  doi: 10.1016/j.ins.2016.09.041
– volume: 43
  start-page: 1
  issue: 1
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b75
  article-title: An empirical comparison of model validation techniques for defect prediction models
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2016.2584050
– volume: 11
  start-page: 341
  issue: 4
  year: 1997
  ident: 10.1016/j.infsof.2020.106432_b57
  article-title: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces
  publication-title: J. Global Optim.
  doi: 10.1023/A:1008202821328
– ident: 10.1016/j.infsof.2020.106432_b47
  doi: 10.1109/ICMLA.2010.27
– year: 2001
  ident: 10.1016/j.infsof.2020.106432_b14
– volume: 12
  start-page: 2825
  year: 2011
  ident: 10.1016/j.infsof.2020.106432_b72
  article-title: Scikit-learn: Machine learning in python
  publication-title: J. Mach. Learn. Res.
– start-page: 878
  year: 2005
  ident: 10.1016/j.infsof.2020.106432_b23
  article-title: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning
– ident: 10.1016/j.infsof.2020.106432_b35
  doi: 10.1109/ASE.2015.56
– start-page: 237
  year: 2007
  ident: 10.1016/j.infsof.2020.106432_b69
  article-title: A statistical framework for the prediction of fault-proneness
– volume: 15
  start-page: 1629
  issue: 6
  year: 2005
  ident: 10.1016/j.infsof.2020.106432_b41
  article-title: Synthesis and activity of oleanolic acid derivatives, a novel class of inhibitors of osteoclast formation
  publication-title: Bioorganic Med. Chem. Lett.
  doi: 10.1016/j.bmcl.2005.01.061
– volume: 16
  start-page: 321
  year: 2002
  ident: 10.1016/j.infsof.2020.106432_b22
  article-title: SMOTE: synthetic minority over-sampling technique
  publication-title: J. Artificial Intelligence Res.
  doi: 10.1613/jair.953
– volume: 20
  start-page: 2267
  issue: 3
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b6
  article-title: A parallel framework for software defect detection and metric selection on cloud computing
  publication-title: Cluster Comput.
  doi: 10.1007/s10586-017-0892-6
– volume: 31
  start-page: 340
  issue: 4
  year: 2005
  ident: 10.1016/j.infsof.2020.106432_b8
  article-title: Predicting the location and number of faults in large software systems
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2005.49
– volume: 19
  start-page: 154
  issue: 1
  year: 2014
  ident: 10.1016/j.infsof.2020.106432_b11
  article-title: Software defect prediction using Bayesian networks
  publication-title: Empir. Softw. Eng.
  doi: 10.1007/s10664-012-9218-8
– volume: 39
  start-page: 757
  issue: 6
  year: 2012
  ident: 10.1016/j.infsof.2020.106432_b31
  article-title: A large-scale empirical study of just-in-time quality assurance
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2012.70
– volume: 41
  start-page: 16
  year: 2013
  ident: 10.1016/j.infsof.2020.106432_b20
  article-title: Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods
  publication-title: Knowl.-Based Syst.
  doi: 10.1016/j.knosys.2012.12.007
– start-page: 1
  year: 2000
  ident: 10.1016/j.infsof.2020.106432_b43
  article-title: Machine learning from imbalanced data sets 101
– start-page: 107
  year: 2010
  ident: 10.1016/j.infsof.2020.106432_b32
  article-title: Effort-aware defect prediction models
– volume: 61
  start-page: 93
  year: 2015
  ident: 10.1016/j.infsof.2020.106432_b46
  article-title: ELBlocker: Predicting blocking bugs with ensemble imbalance learning
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2014.12.006
– year: 2018
  ident: 10.1016/j.infsof.2020.106432_b4
  article-title: Perceptions, expectations, and challenges in defect prediction
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2018.2877678
– volume: 14
  start-page: 540
  issue: 5
  year: 2009
  ident: 10.1016/j.infsof.2020.106432_b28
  article-title: On the relative value of cross-company and within-company data for defect prediction
  publication-title: Empir. Softw. Eng.
  doi: 10.1007/s10664-008-9103-7
– start-page: 318
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b78
  article-title: Software defect prediction via convolutional neural network
– volume: 26
  start-page: 97
  issue: 1
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b18
  article-title: Tackling class overlap and imbalance problems in software defect prediction
  publication-title: Softw. Qual. J.
  doi: 10.1007/s11219-016-9342-6
– volume: SE-10
  start-page: 36
  issue: 1
  year: 1984
  ident: 10.1016/j.infsof.2020.106432_b1
  article-title: Software quality assurance
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.1984.5010196
– volume: 25
  start-page: 201
  issue: 2
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b5
  article-title: Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction
  publication-title: Autom. Softw. Eng.
  doi: 10.1007/s10515-017-0220-7
– start-page: 1322
  year: 2008
  ident: 10.1016/j.infsof.2020.106432_b24
  article-title: ADASYN: Adaptive synthetic sampling approach for imbalanced learning
– volume: 58
  start-page: 388
  year: 2015
  ident: 10.1016/j.infsof.2020.106432_b45
  article-title: Software defect prediction using ensemble learning on selected features
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2014.07.005
– volume: 30
  start-page: 1145
  issue: 7
  year: 1997
  ident: 10.1016/j.infsof.2020.106432_b66
  article-title: The use of the area under the ROC curve in the evaluation of machine learning algorithms
  publication-title: Pattern Recognit.
  doi: 10.1016/S0031-3203(96)00142-2
– start-page: 192
  year: 2008
  ident: 10.1016/j.infsof.2020.106432_b53
  article-title: On the class imbalance problem
– volume: 89
  year: 2020
  ident: 10.1016/j.infsof.2020.106432_b71
  article-title: An adaptive framework against android privilege escalation threats using deep learning and semi-supervised approaches
  publication-title: Appl. Soft Comput.
  doi: 10.1016/j.asoc.2020.106089
– start-page: 1050
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b30
  article-title: Is “better data” better than “better data miners”?
– volume: 126
  start-page: 94
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b39
  article-title: Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique
  publication-title: Energy Build.
  doi: 10.1016/j.enbuild.2016.05.028
– volume: 42
  start-page: 1806
  issue: 6
  year: 2012
  ident: 10.1016/j.infsof.2020.106432_b44
  article-title: Using coding-based ensemble learning to improve software defect prediction
  publication-title: IEEE Trans. Syst. Man Cybern. B
  doi: 10.1109/TSMCC.2012.2226152
– volume: 62
  start-page: 1
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b40
  article-title: A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2016.06.005
– volume: 42
  start-page: 977
  issue: 10
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b62
  article-title: Hydra: Massively compositional model for cross-project defect prediction
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2016.2543218
– volume: 100
  start-page: 87
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b3
  article-title: Cross project defect prediction using class distribution estimation and oversampling
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2018.04.001
– year: 2002
  ident: 10.1016/j.infsof.2020.106432_b25
– volume: 44
  start-page: 534
  issue: 6
  year: 2018
  ident: 10.1016/j.infsof.2020.106432_b12
  article-title: MAHAKIL: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2017.2731766
– start-page: 1
  year: 2013
  ident: 10.1016/j.infsof.2020.106432_b42
  article-title: Data mining for microrna gene prediction: on the impact of class imbalance and feature number for microrna gene prediction
– volume: 59
  start-page: 170
  year: 2015
  ident: 10.1016/j.infsof.2020.106432_b65
  article-title: An empirical study on software defect prediction with a simplified metric set
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2014.11.006
– volume: 50
  start-page: 1
  issue: 1
  year: 2000
  ident: 10.1016/j.infsof.2020.106432_b26
  article-title: The mahalanobis distance
  publication-title: Chemometr. Intell. Lab. Syst.
  doi: 10.1016/S0169-7439(99)00047-7
– volume: 30
  start-page: 950
  issue: 5
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b56
  article-title: Minority oversampling in kernel adaptive subspaces for class imbalanced datasets
  publication-title: IEEE Trans. Knowl. Data Eng.
  doi: 10.1109/TKDE.2017.2779849
– ident: 10.1016/j.infsof.2020.106432_b33
  doi: 10.1145/2950290.2950353
– year: 2005
  ident: 10.1016/j.infsof.2020.106432_b70
  article-title: A novel method for early software quality prediction based on support vector machine
– start-page: 487
  year: 2006
  ident: 10.1016/j.infsof.2020.106432_b9
  article-title: A method for an accurate early prediction of faults in modified classes
– volume: 42
  start-page: 544
  issue: 3
  year: 2015
  ident: 10.1016/j.infsof.2020.106432_b19
  article-title: A dissimilarity-based imbalance data classification algorithm
  publication-title: Appl. Intell.
  doi: 10.1007/s10489-014-0610-5
– volume: 63
  start-page: 676
  issue: 2
  year: 2014
  ident: 10.1016/j.infsof.2020.106432_b48
  article-title: Two-stage cost-sensitive learning for software defect prediction
  publication-title: IEEE Trans. Reliab.
  doi: 10.1109/TR.2014.2316951
– volume: 2016
  start-page: 6:6
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b51
  article-title: Prediction of defective software modules using class imbalance learning
  publication-title: Appl. Comp. Intell. Soft Comput.
– volume: 43
  start-page: 476
  issue: 5
  year: 2016
  ident: 10.1016/j.infsof.2020.106432_b2
  article-title: The use of summation to aggregate software metrics hinders the performance of defect prediction models
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2016.2599161
– volume: 92
  start-page: 17
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b77
  article-title: Which type of metrics are useful to deal with class imbalance in software defect prediction?
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2017.07.004
– start-page: 630
  year: 2017
  ident: 10.1016/j.infsof.2020.106432_b17
  article-title: Impact of the distribution parameter of data sampling approaches on software defect prediction models
SSID ssj0017030
Score 2.590461
Snippet Generally, there are more non-defective instances than defective instances in the datasets used for software defect prediction (SDP), which is referred to as...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 106432
SubjectTerms Class imbalance
Effort-aware defect prediction
MAHAKIL
Oversampling
SMOTE
Software defect prediction
Title COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction
URI https://dx.doi.org/10.1016/j.infsof.2020.106432
Volume 129
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1JS8NAFB6KgngRV6xLmYPX2GyTSbyV0lIV20Nb6C3MFo1oWmqqePG3-16WoiAKQiBkMgPD2yf53nuEXEitHR5oOJso31gQEYPOsTCwsMKPCALN3CK_4m4YDKb-zYzNGqRb58IgrLKy_aVNL6x1NdKuqNlepGl7DMGBDe4zcjGqD0NM4vN9jlJ--bGGeTgo0WW9PdvC2XX6XIHxAia-zLGQp4tD4Jzdn93TF5fT3yU7VaxIO-V29kjDZPtkq4aqH5BVdzSe9K4o6jTWtczfLXRKmo5APscCseLZPZ30qiqtNJ9T7JzyCtyAhwdDFYbONH2WiG9UhlbdZWiaUdhw_iaWhmqDgA94hX90kIuHZNrvTboDq2qjYCk4D-QWd6QfoWariEtm68QxLPIS6ekAbkZIuECzQ6NDN9HSVcZLwHwyePa5dhPviGxk88wcE8oh_okkU16SMF8IOxQh15wFhplIBJ5oEq-mXqyqGuPY6uIprsFkj3FJ8xhpHpc0bxJrvWpR1tj4Yz6vGRN_k5UY3MCvK0_-vfKUbLuIZik-vpyRjXy5MucQjuSyVchbi2x2rm8Hw08j_OBz
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8NAEB5qC-pFfOLbPXgNbZNsHt5KqVT7OrSF3sJudqMRTYumiv_emWYjCqIgBEI2GVjmvdlvZwAupVJN31O4NoldbWFGjDbHA8-iCj_C8xS3V-crBkOvO3VvZ3xWgXZ5FoZglcb3Fz595a3NSN1ws75I0_oYk4MGhs_Qpqw-CMI1qFF1Kl6FWuum1x1-biaQUhcl9xoWEZQn6FYwL5Tjy5xqedo0hPHZ_jlCfYk619uwZdJF1ipmtAMVne3CeolW34NlezSedK4YmTWVtszfLYpLio1QRceC4OLZHZt0TKFWls8ZNU95RYHgw71mMWXPLH2SBHGMNTMNZliaMZxw_iaeNVOaMB_4ijZ1SJD7ML3uTNpdy3RSsGJcEuSW35RuSMYdh77kDZU0NQ-dRDrKw5sWEi807kCrwE6UtGPtJOhBOT67vrIT5wCq2TzTh8B8TIFCyWMnSbgrRCMQga987mmuQ-E54gickntRbMqMU7eLx6jEkz1EBc8j4nlU8PwIrE-qRVFm44_v_VIw0Td1iTAS_Ep5_G_KC9joTgb9qH8z7J3Apk3gltW_mFOo5s9LfYbZSS7PjfZ9ANmb4yQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=COSTE%3A+Complexity-based+OverSampling+TEchnique+to+alleviate+the+class+imbalance+problem+in+software+defect+prediction&rft.jtitle=Information+and+software+technology&rft.au=Feng%2C+Shuo&rft.au=Keung%2C+Jacky&rft.au=Yu%2C+Xiao&rft.au=Xiao%2C+Yan&rft.date=2021-01-01&rft.issn=0950-5849&rft.volume=129&rft.spage=106432&rft_id=info:doi/10.1016%2Fj.infsof.2020.106432&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_infsof_2020_106432
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-5849&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-5849&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-5849&client=summon