Preserving data utility via BART

When preparing data for public release, information organizations face the challenge of preserving the quality of data while protecting the confidentiality of both data subjects and sensitive data attributes. Without knowing what type of analyses will be conducted by data users, it is often hard to...

Full description

Saved in:
Bibliographic Details
Published inJournal of statistical planning and inference Vol. 140; no. 9; pp. 2551 - 2561
Main Authors Wang, Xinlei, Karr, Alan F.
Format Journal Article
LanguageEnglish
Published Kidlington Elsevier B.V 01.09.2010
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When preparing data for public release, information organizations face the challenge of preserving the quality of data while protecting the confidentiality of both data subjects and sensitive data attributes. Without knowing what type of analyses will be conducted by data users, it is often hard to alter data without sacrificing data utility. In this paper, we propose a new approach to mitigate this difficulty, which entails using Bayesian additive regression trees (BART), in connection with existing methods for statistical disclosure limitation, to help preserve data utility while meeting confidentiality requirements. We illustrate the performance of our method through both simulation and a data example. The method works well when the targeted relationship underlying the original data is not weak, and the performance appears to be robust to the intensity of alteration.
ISSN:0378-3758
1873-1171
DOI:10.1016/j.jspi.2010.03.022