DP-Share: Privacy-Preserving Software Defect Prediction Model Sharing Through Differential Privacy

In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alle...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 34; no. 5; pp. 1020 - 1038
Main Authors Chen, Xiang, Zhang, Dun, Cui, Zhan-Qi, Gu, Qing, Ju, Xiao-Lin
Format Journal Article
LanguageEnglish
Published New York Springer US 01.09.2019
Springer
Springer Nature B.V
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
School of Information Science and Technology, Nantong University, Nantong 226019, China
School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore%School of Information Science and Technology, Nantong University, Nantong 226019, China%State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
Computer School, Beijing Information Science and Technology University, Beijing 100101, China%State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China%School of Information Science and Technology, Nantong University, Nantong 226019, China
Subjects
Online AccessGet full text
ISSN1000-9000
1860-4749
DOI10.1007/s11390-019-1958-0

Cover

More Information
Summary:In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-019-1958-0