Analysis and modeling conditional mutual dependency of metrics in software defect prediction using latent variables

Software defect prediction constitutes an important discipline in software development life-cycle. Among the techniques employed in this domain, Naive Bayes (NB) classifier is cited by a large number of researchers for its simple structure and remarkable classification performance notwithstanding th...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 460; pp. 309 - 330
Main Authors	Harzevili, Nima Shiri, Alizadeh, Sasan H.
Format	Journal Article
Language	English
Published	Elsevier B.V 14.10.2021
Subjects	Latent variable Naive Bayes classifier Software defect prediction Software metrics Latent variable Software defect prediction Software metrics Naive Bayes classifier 00–01 99–00
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Software defect prediction constitutes an important discipline in software development life-cycle. Among the techniques employed in this domain, Naive Bayes (NB) classifier is cited by a large number of researchers for its simple structure and remarkable classification performance notwithstanding the concern of whether it is theoretically justified or not. More concisely, NB is fundamentally built on the strong assumption of conditional independence of attributes, and the major question here is the compliance of software metrics with this assumption. To address this question, we propose a novel framework “MLMNB-SDP” equipped with a statistical hypothesis testing method to detect those software metrics with a significant conditional dependency. MLMNB-SDP is designed to handle conditional dependencies via a single latent variable in a predefined structure which is responsible for preserving the connection between pairs of software metrics when the class variables are instantiated. We evaluate the effectiveness of our approach based on its capability to measure conditional dependency of software metrics and defect prediction performance. For the former one, we employ Conditional Mutual Information (CMI), and for the later one we use three settings for defect prediction; (1) Within-Project Defect-Prediction (WPDP), (2) Cross-Project Defect-Prediction (CPDP), and (3) stratified k-fold cross-validation. Our metrics dependency analysis results indicate that traditional file-level software metrics demonstrate a significant conditional mutual dependency and the application of naive Bayes classifier in this domain is not theoretically acceptable. Our results based on the three settings indicate that MLMNB-SDP improves naive Bayes classifier 5.45% to 75.86% and outperforms well-known benchmark classifiers, i.e., Random Forest and Logistic Regression, regarding a significant increase in Precision, Recall, and F1 Score, Mathew’s Correlation Coefficient (MCC), and area under the ROC curve (AUC) values.
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2021.05.043