Cross-Project and Within-Project Semisupervised Software Defect Prediction: A Unified Approach

When there exist not enough historical defect data for building an accurate prediction model, semisupervised defect prediction (SSDP) and cross-project defect prediction (CPDP) are two feasible solutions. Existing CPDP methods assume that the available source data are well labeled. However, due to e...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on reliability Vol. 67; no. 2; pp. 581 - 597
Main Authors Wu, Fei, Jing, Xiao-Yuan, Sun, Ying, Sun, Jing, Huang, Lin, Cui, Fangyi, Sun, Yanfei
Format Journal Article
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When there exist not enough historical defect data for building an accurate prediction model, semisupervised defect prediction (SSDP) and cross-project defect prediction (CPDP) are two feasible solutions. Existing CPDP methods assume that the available source data are well labeled. However, due to expensive human efforts for labeling a large amount of defect data, usually, we can only utilize the suitable unlabeled source data. We call CPDP in this scenario as cross-project semisupervised defect prediction (CSDP). Although some within-project semisupervised defect prediction (WSDP) methods have been developed in recent years, there still exists much room for improvement on prediction performance. In this paper, we aim to provide a unified and effective solution for both CSDP and WSDP problems. We introduce the semisupervised dictionary learning technique and propose a cost-sensitive kernelized semisupervised dictionary learning (CKSDL) approach. CKSDL can make full use of the limited labeled defect data and a large amount of unlabeled data in the kernel space. In addition, CKSDL considers the misclassification costs in the dictionary learning process. Extensive experiments on 16 projects indicate that CKSDL outperforms state-of-the-art WSDP methods, using unlabeled cross-project defect data can help improve the WSDP performance, and CKSDL generally obtains significantly better prediction performance than related SSDP methods in the CSDP scenario.
ISSN:0018-9529
1558-1721
DOI:10.1109/TR.2018.2804922