Towards an understanding of intra-defect associations: Implications for defect prediction

In previous studies, when collecting defect data, if the fix of a defect spans multiple modules, each involved module is labeled as defective. In this context, the defect prediction models are built based on the features of each individual module, ignoring the potential associations between the modu...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of systems and software Vol. 207; p. 111858
Main Authors Zhao, Yangyang, Jiang, Mingyue, Yang, Yibiao, Zhou, Yuming, Ma, Hanjie, Ding, Zuohua
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.01.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In previous studies, when collecting defect data, if the fix of a defect spans multiple modules, each involved module is labeled as defective. In this context, the defect prediction models are built based on the features of each individual module, ignoring the potential associations between the modules involved in the same defect(referred to as “intra-defect associations”). Considering the possibility of numerous cross-module defects in practice, we hypothesize that these intra-defect associations could play a crucial role in enhancing defect prediction performance. Unfortunately, there is no empirical evidence to know that. To this end, we are motivated to conduct a comprehensive study to explore the implications of intra-defect associations for defect prediction. We first examine the proportion of cross-module defects and the relationships between the involved modules. The results reveal that, at function level, the majority of defects occur across functions, with most of the cross-module defects exhibiting implicit dependencies. Inspired by these findings, we propose a novel data processing approach for building defect prediction models. This approach leverages the intra-defect associations by merging the involved modules into new instances with mean or median variables to augment the training data. The experimental results indicate that considering intra-defect associations can significantly improve the defect prediction performance in both the ranking and classification scenarios. This study provides valuable insights into the implications of intra-defect associations for defect prediction. •The first to leverage intra-defect associations for defect prediction.•A novel data processing approach for building defect prediction models.•The majority of defects occur across functions.•Most cross-module defects have only implicit dependencies.•Considering intra-defect associations improves defect prediction performance.
ISSN:0164-1212
1873-1228
DOI:10.1016/j.jss.2023.111858