A weighted fuzzy classification approach to identify and manipulate coincidental correct test cases for fault localization

•A proven CC test case identification approach which provide labeled samples.•A weighted fuzzy classification (FW-KNN) to identify potential CC test cases.•Three fuzzy strategies to manipulate CC test cases for CBFL.•FW-KNN is empirically demonstrated better than K-means, SVM and Bayes. Identifying...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of systems and software Vol. 151; pp. 20 - 37
Main Authors	Liu, Yong, Li, Meiying, Wu, Yonghao, Li, Zheng
Format	Journal Article
Language	English
Published	Elsevier Inc 01.05.2019
Subjects	Coincidental correct test cases Coverage-Based fault localization Fuzzy classification K-Nearest Neighbor Software debugging Coincidental correct test cases K-Nearest Neighbor Coverage-Based fault localization Software debugging Fuzzy classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•A proven CC test case identification approach which provide labeled samples.•A weighted fuzzy classification (FW-KNN) to identify potential CC test cases.•Three fuzzy strategies to manipulate CC test cases for CBFL.•FW-KNN is empirically demonstrated better than K-means, SVM and Bayes. Identifying the location of faults effectively and accurately is highly important in the debugging process of software engineering. Coverage-based Fault Localization (CBFL) has been widely studied that can alleviate the effort of developers to find the faults position using the execution information of test cases. Coincidental Correct (CC) test cases are the specific test cases that execute the faulty statements but with a correct output, which have been illustrated with a negative effect on the accuracy of CBFL. In this paper, we propose a weighted fuzzy classification approach to identify CC test cases and three fuzzy strategies are suggested to manipulate CC test cases for CBFL. Firstly, we present a simple but efficient approach to identify some CC test cases for single fault programs, which provide labeled samples that enable the application of supervised classification algorithms for CC identification. Then, a Fuzzy Weighted K-Nearest Neighbor (FW-KNN) algorithm is proposed to classify potential CC from the passed test cases, in which a ‘weighted’ similarity measure and a “weighted” CC probability computation are presented. Finally, three fuzzy CC test cases manipulation strategies are presented to mitigate the impact of CC test cases in CBFL. Various empirical studies are conducted on 190 faulty versions of 12 programs to investigate the impact of “weighted” and “fuzzy” methods for CC identification by the comparison of the effectiveness and efficiency between FW-KNN and three popular cluster and classification techniques. The results indicate that the proposed FW-KNN has higher accuracy and lower time cost. The Precision, Recall and False Positive Rate of FW-KNN is 96.47%, 83.40% and 2.85%, respectively. Besides, by utilizing code block coverage, the time cost can be reduced by 72.97% in average compared to statement coverage. The experimental results also indicate that the fault localization accuracy of CBFL can be improved by the proposed CC test cases manipulation strategies.
ISSN:	0164-1212 1873-1228
DOI:	10.1016/j.jss.2019.01.056