Embedding differential privacy in decision tree algorithm with different depths

Differential privacy（DP） has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining（DM）logic is considered in the DP implementation. However, although one-step DM computation for de...

Full description

Saved in:

Bibliographic Details
Published in	Science China. Information sciences Vol. 60; no. 8; pp. 132 - 146
Main Authors	Bai, Xuanyu, Yao, Jianguo, Yuan, Mingxuan, Deng, Ke, Xie, Xike, Guan, Haibing
Format	Journal Article
Language	English
Published	Beijing Science China Press 01.08.2017 Springer Nature B.V
Subjects	Accuracy Algorithms Complexity Computer Science Data mining Decision trees DT模型 Embedding Information Systems and Communication Service Markov chains Privacy Research Paper Solution space 决策树算法嵌入数据挖掘求解空间隐私保护预测精度马尔可夫链 exponential mechanism MCMC decision tree rential privacy exhaustive search
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Differential privacy（DP） has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining（DM）logic is considered in the DP implementation. However, although one-step DM computation for decision tree（DT） model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo（MCMC） method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation（previous work）, two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large（e.g., ∈ = 10）, this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.
Bibliography:	differential privacy, decision tree, exponential mechanism, exhaustive search, MCMC 11-5847/TP Differential privacy（DP） has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining（DM）logic is considered in the DP implementation. However, although one-step DM computation for decision tree（DT） model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo（MCMC） method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation（previous work）, two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large（e.g., ∈ = 10）, this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.
ISSN:	1674-733X 1869-1919
DOI:	10.1007/s11432-016-0442-1