Collaborative filtering based recommendation of sampling methods for software defect prediction

The performance of software defect prediction have been hindered by the imbalanced nature of software defect data. Fortunately, a variety of sampling methods have been employed to improve defect prediction performance. However, researchers and practitioners are usually burdened with selecting the op...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 90; p. 106163
Main Authors Sun, Zhongbin, Zhang, Jingqi, Sun, Heli, Zhu, Xiaoyan
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.05.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The performance of software defect prediction have been hindered by the imbalanced nature of software defect data. Fortunately, a variety of sampling methods have been employed to improve defect prediction performance. However, researchers and practitioners are usually burdened with selecting the optimal sampling methods for the defect data at hand. In practice, no sampling method has been found to perform best in theory and practice. Therefore it is necessary and valuable to study how to select applicable sampling methods according to the current data characteristics. This paper presents a collaborative filtering based sampling methods recommendation algorithm (CFSR) for automatically recommending applicable sampling methods for the new defect data. CFSR firstly ranks existing sampling methods with historical defect data, and then mines the data similarity between the new and historical defect data with meta-features. Finally, all the information of ranked sampling methods and data similarity are combined to build a recommendation network, with which the user-based collaborative filtering algorithm is employed to recommend appropriate sampling methods for the new defect data. A thorough experiment with five classification algorithms, two prediction performance, five recommendation performance and 12 popular sampling methods was conducted over 20 imbalanced software defect data. The experimental results firstly demonstrate the importance and necessity of present study, and then show that the proposed CFSR method is feasible and effective. •We validate the importance and necessity of selecting appropriate sampling methods.•We propose a collaborative filtering based method for selecting sampling methods.•We demonstrate the feasibility and effectiveness of the proposed method.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2020.106163