Preserving data utility via BART
When preparing data for public release, information organizations face the challenge of preserving the quality of data while protecting the confidentiality of both data subjects and sensitive data attributes. Without knowing what type of analyses will be conducted by data users, it is often hard to...
Saved in:
Published in | Journal of statistical planning and inference Vol. 140; no. 9; pp. 2551 - 2561 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Kidlington
Elsevier B.V
01.09.2010
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | When preparing data for public release, information organizations face the challenge of preserving the quality of data while protecting the confidentiality of both data subjects and sensitive data attributes. Without knowing what type of analyses will be conducted by data users, it is often hard to alter data without sacrificing data utility. In this paper, we propose a new approach to mitigate this difficulty, which entails using Bayesian additive regression trees (BART), in connection with existing methods for statistical disclosure limitation, to help preserve data utility while meeting confidentiality requirements. We illustrate the performance of our method through both simulation and a data example. The method works well when the targeted relationship underlying the original data is not weak, and the performance appears to be robust to the intensity of alteration. |
---|---|
ISSN: | 0378-3758 1873-1171 |
DOI: | 10.1016/j.jspi.2010.03.022 |