Towards A Differential Privacy and Utility Preserving Machine Learning Classifier

Many organizations transact in large amounts of data often containing personal identifiable information (PII) and various confidential data. Such organizations are bound by state, federal, and international laws to ensure that the confidentiality of both individuals and sensitive data is not comprom...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 12; pp. 176 - 181
Main Authors	Mivule, Kato, Turner, Claude, Ji, Soo-Yeon
Format	Journal Article
Language	English
Published	Elsevier B.V 2012
Subjects	Differential Privacy Ensemble Machine Learning Privacy Preserving Classification Differential Privacy Privacy Preserving Classification Ensemble Machine Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many organizations transact in large amounts of data often containing personal identifiable information (PII) and various confidential data. Such organizations are bound by state, federal, and international laws to ensure that the confidentiality of both individuals and sensitive data is not compromised. However, during the privacy preserving process, the utility of such datasets diminishes even while confidentiality is achieved--a problem that has been defined as NP-Hard. In this paper, we investigate a differential privacy machine learning ensemble classifier approach that seeks to preserve data privacy while maintaining an acceptable level of utility. The first step of the methodology applies a strong data privacy granting technique on a dataset using differential privacy. The resulting perturbed data is then passed through a machine learning ensemble classifier, which aims to reduce the classification error, or, equivalently, to increase utility. Then, the association between increasing the number of weak decision tree learners and data utility, which informs us as to whether the ensemble machine learner would classify more correctly is examined. As results, we found that a combined adjustment of the privacy granting noise parameters and an increase in the number of weak learners in the ensemble machine might lead to a lower classification error.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2012.09.050