DSSDPP: Data Selection and Sampling Based Domain Programming Predictor for Cross-Project Defect Prediction

Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists lar...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on software engineering Vol. 49; no. 4; pp. 1941 - 1963
Main Authors	Li, Zhiqiang, Zhang, Hongyu, Jing, Xiao-Yuan, Xie, Juanying, Guo, Min, Ren, Jie
Format	Journal Article
Language	English
Published	New York IEEE 01.04.2023 IEEE Computer Society
Subjects	Cross-project defect prediction Data models data sampling data selection Defects domain programming predictor Domains Knowledge management Measurement Parameters Prediction algorithms Predictive models Programming Sampling software quality assurance Transfer learning Tuning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cross-project defect prediction (CPDP) refers to recognizing defective software modules in one project (i.e., target) using historical data collected from other projects (i.e., source), which can help developers find defects and prioritize their testing efforts. Unfortunately, there often exists large distribution difference between the source and target data. Most CPDP methods neglect to select the appropriate source data for a given target at the project level. More importantly, existing CPDP models are parametric methods, which usually require intensive parameter selection and tuning to achieve better prediction performance. This would hinder wide applicability of CPDP in practice. Moreover, most CPDP methods do not address the cross-project class imbalance problem. These limitations lead to suboptimal CPDP results. In this paper, we propose a novel data selection and sampling based domain programming predictor (DSSDPP) for CPDP, which addresses the above limitations. DSSDPP is a non-parametric CPDP method, which can perform knowledge transfer across projects without the need for parameter selection and tuning. By exploiting the structures of source and target data, DSSDPP can learn a discriminative transfer classifier for identifying defects of the target project. Extensive experiments on 22 projects from four datasets indicate that DSSDPP achieves better MCC and AUC results against a range of competing methods both in the single-source and multi-source scenarios. Since DSSDPP is easy, effective, extensible, and efficient, we suggest that future work can use it with the well-chosen source data to conduct CPDP especially for the projects with limited computational budget.
ISSN:	0098-5589 1939-3520
DOI:	10.1109/TSE.2022.3204589