Improving Ranking-Oriented Defect Prediction Using a Cost-Sensitive Ranking SVM

Context: Ranking-oriented defect prediction (RODP) ranks software modules to allocate limited testing resources to each module according to the predicted number of defects. Most RODP methods overlook that ranking a module with more defects incorrectly makes it difficult to successfully find all of t...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on reliability Vol. 69; no. 1; pp. 139 - 153
Main Authors	Yu, Xiao, Liu, Jin, Keung, Jacky Wai, Li, Qing, Bennin, Kwabena Ebo, Xu, Zhou, Wang, Junping, Cui, Xiaohui
Format	Journal Article
Language	English
Published	New York IEEE 01.03.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Computer science Cost-sensitive learning data imbalance Datasets Decision tree regression Decision trees Defect prediction Defects Fault percentile averages Forecasting Genetic algorithms Information retrieval Learning systems Learning to rank Machine learning Modules Oversampling Performance enhancement Prediction algorithms Predictive models Random under samplings Ranking Ranking support vector machines (SVM) ranking-oriented defect prediction (RODP) Regression Regression analysis Resampling Software Software algorithms Software defect prediction Software testing Support vector machines Synthetic minority over-sampling techniques Teaching methods Testing Trees (mathematics)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Context: Ranking-oriented defect prediction (RODP) ranks software modules to allocate limited testing resources to each module according to the predicted number of defects. Most RODP methods overlook that ranking a module with more defects incorrectly makes it difficult to successfully find all of the defects in the module due to fewer testing resources being allocated to the module, which results in much higher costs than incorrectly ranking the modules with fewer defects, and the numbers of defects in software modules are highly imbalanced in defective software datasets. Cost-sensitive learning is an effective technique in handling the cost issue and data imbalance problem for software defect prediction. However, the effectiveness of cost-sensitive learning has not been investigated in RODP models. Aims: In this article, we propose a cost-sensitive ranking support vector machine (SVM) (CSRankSVM) algorithm to improve the performance of RODP models. Method: CSRankSVM modifies the loss function of the ranking SVM algorithm by adding two penalty parameters to address both the cost issue and the data imbalance problem. Additionally, the loss function of the CSRankSVM is optimized using a genetic algorithm. Results: The experimental results for 11 project datasets with 41 releases show that CSRankSVM achieves 1.12%-15.68% higher average fault percentile average (FPA) values than the five existing RODP methods (i.e., decision tree regression, linear regression, Bayesian ridge regression, ranking SVM, and learning-to-rank (LTR)) and 1.08%-15.74% higher average FPA values than the four data imbalance learning methods (i.e., random undersampling and a synthetic minority oversampling technique; two data resampling methods; RankBoost, an ensemble learning method; IRSVM, a CSRankSVM method for information retrieval). Conclusion: CSRankSVM is capable of handling the cost issue and data imbalance problem in RODP methods and achieves better performance. Therefore, CSRankSVM is recommended as an effective method for RODP.
ISSN:	0018-9529 1558-1721 1558-1721
DOI:	10.1109/TR.2019.2931559