Towards A Differential Privacy and Utility Preserving Machine Learning Classifier

Many organizations transact in large amounts of data often containing personal identifiable information (PII) and various confidential data. Such organizations are bound by state, federal, and international laws to ensure that the confidentiality of both individuals and sensitive data is not comprom...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 12; pp. 176 - 181
Main Authors Mivule, Kato, Turner, Claude, Ji, Soo-Yeon
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many organizations transact in large amounts of data often containing personal identifiable information (PII) and various confidential data. Such organizations are bound by state, federal, and international laws to ensure that the confidentiality of both individuals and sensitive data is not compromised. However, during the privacy preserving process, the utility of such datasets diminishes even while confidentiality is achieved--a problem that has been defined as NP-Hard. In this paper, we investigate a differential privacy machine learning ensemble classifier approach that seeks to preserve data privacy while maintaining an acceptable level of utility. The first step of the methodology applies a strong data privacy granting technique on a dataset using differential privacy. The resulting perturbed data is then passed through a machine learning ensemble classifier, which aims to reduce the classification error, or, equivalently, to increase utility. Then, the association between increasing the number of weak decision tree learners and data utility, which informs us as to whether the ensemble machine learner would classify more correctly is examined. As results, we found that a combined adjustment of the privacy granting noise parameters and an increase in the number of weak learners in the ensemble machine might lead to a lower classification error.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2012.09.050