Privacy Preserving Synthetic Data Release Using Deep Learning

For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymiz...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases Vol. 11051; pp. 510 - 526
Main Authors Abay, Nazmiye Ceren, Zhou, Yan, Kantarcioglu, Murat, Thuraisingham, Bhavani, Sweeney, Latanya
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2019
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
Abstract For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymization approaches do not provide rigorous privacy guarantees. Although, there are existing synthetic data generation techniques that use rigorous definitions of differential privacy, to our knowledge, these techniques have not been compared extensively using different utility metrics. In this work, we provide two novel contributions. First, we compare existing techniques on different datasets using different utility metrics. Second, we present a novel approach that utilizes deep learning techniques coupled with an efficient analysis of privacy costs to generate differentially private synthetic datasets with higher data utility. We show that we can learn deep learning models that can capture relationship among multiple features, and then use these models to generate differentially private synthetic datasets. Our extensive experimental evaluation conducted on multiple datasets indicates that our proposed approach is more robust (i.e., one of the top performing technique in almost all type of data we have experimented) compared to the state-of-the art methods in terms of various data utility measures. Code related to this paper is available at: https://github.com/ncabay/synthetic_generation.
AbstractList For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymization approaches do not provide rigorous privacy guarantees. Although, there are existing synthetic data generation techniques that use rigorous definitions of differential privacy, to our knowledge, these techniques have not been compared extensively using different utility metrics. In this work, we provide two novel contributions. First, we compare existing techniques on different datasets using different utility metrics. Second, we present a novel approach that utilizes deep learning techniques coupled with an efficient analysis of privacy costs to generate differentially private synthetic datasets with higher data utility. We show that we can learn deep learning models that can capture relationship among multiple features, and then use these models to generate differentially private synthetic datasets. Our extensive experimental evaluation conducted on multiple datasets indicates that our proposed approach is more robust (i.e., one of the top performing technique in almost all type of data we have experimented) compared to the state-of-the art methods in terms of various data utility measures. Code related to this paper is available at: https://github.com/ncabay/synthetic_generation.
Author Thuraisingham, Bhavani
Sweeney, Latanya
Zhou, Yan
Kantarcioglu, Murat
Abay, Nazmiye Ceren
Author_xml – sequence: 1
  givenname: Nazmiye Ceren
  orcidid: 0000-0002-7930-3455
  surname: Abay
  fullname: Abay, Nazmiye Ceren
  email: nca150130@utdallas.edu
– sequence: 2
  givenname: Yan
  orcidid: 0000-0001-6122-7362
  surname: Zhou
  fullname: Zhou, Yan
– sequence: 3
  givenname: Murat
  orcidid: 0000-0001-9795-9063
  surname: Kantarcioglu
  fullname: Kantarcioglu, Murat
– sequence: 4
  givenname: Bhavani
  orcidid: 0000-0003-3776-3362
  surname: Thuraisingham
  fullname: Thuraisingham, Bhavani
– sequence: 5
  givenname: Latanya
  orcidid: 0000-0003-3610-8892
  surname: Sweeney
  fullname: Sweeney, Latanya
BookMark eNpVUMtOwzAQNFAQbekfcMgPGNZeJ44PHFDLS6pEBfRsOcmWFqok2KFS_x6HcuG0uzOa3Z0ZsUHd1MTYpYArAaCvjc45ckDgAoxMubYojtgkwhjBX0wfs6HIhOCIypz84xQM2DD2khut8IyNBOSIoKXIztkkhA8AkEJkaaqH7GbhNztX7pOFp0B-t6nfk9d93a2p25TJzHUueaEtuUDJMvTkjKhN5uR8HacLdrpy20CTvzpmy_u7t-kjnz8_PE1v57yVCjvuFBaApsoNmiyXRpSrXCsJ0gEgFvmqABX_LqGQUlWx5FKmDkFVVVViVuGYycPe0Pp4lrwtmuYzWAG2D8xG9xZt9Gx_w7F9YFGkDqLWN1_fFDpLvaqkuvNuW65d25EPNjVSKECr8symqPEHb_Roxg
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2019
Copyright_xml – notice: Springer Nature Switzerland AG 2019
DBID FFUUA
DEWEY 6.31
DOI 10.1007/978-3-030-10925-7_31
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9783030109257
3030109259
EISSN 1611-3349
Editor Berlingerio, Michele
Ifrim, Georgiana
Hurley, Neil
Gärtner, Thomas
Bonchi, Francesco
Editor_xml – sequence: 1
  fullname: Berlingerio, Michele
– sequence: 2
  fullname: Ifrim, Georgiana
– sequence: 3
  fullname: Gärtner, Thomas
– sequence: 4
  fullname: Bonchi, Francesco
– sequence: 5
  fullname: Hurley, Neil
EndPage 526
ExternalDocumentID EBC5921403_486_537
GroupedDBID 0D6
0DA
38.
AABBV
AEDXK
AEJLV
AEKFX
AEZAY
AIFIR
ALEXF
ALMA_UNASSIGNED_HOLDINGS
AYMPB
BBABE
CXBFT
CZZ
EXGDT
FCSXQ
FFUUA
I4C
IEZ
MGZZY
NSQWD
OORQV
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RIG
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p243t-a43b039d893968291cf874202a0033b8fb04924c0b224dc0b8225a304dddc36d3
ISBN 9783030109240
3030109240
ISSN 0302-9743
IngestDate Tue Jul 29 20:13:46 EDT 2025
Mon Apr 07 21:44:09 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum Q334-342
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-a43b039d893968291cf874202a0033b8fb04924c0b224dc0b8225a304dddc36d3
OCLC 1083307216
ORCID 0000-0001-6122-7362
0000-0002-7930-3455
0000-0003-3776-3362
0000-0003-3610-8892
0000-0001-9795-9063
PQID EBC5921403_486_537
PageCount 17
ParticipantIDs springer_books_10_1007_978_3_030_10925_7_31
proquest_ebookcentralchapters_5921403_486_537
PublicationCentury 2000
PublicationDate 2019
PublicationDateYYYYMMDD 2019-01-01
PublicationDate_xml – year: 2019
  text: 2019
PublicationDecade 2010
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I
PublicationTitle Machine Learning and Knowledge Discovery in Databases
PublicationYear 2019
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Steffen, Bernhard
Pandu Rangan, C.
Kanade, Takeo
Kittler, Josef
Hutchison, David
Tygar, Doug
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
– sequence: 8
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
– sequence: 9
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 10
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
– sequence: 11
  givenname: Doug
  surname: Tygar
  fullname: Tygar, Doug
SSID ssj0002116557
ssj0002792
Score 2.329001
Snippet For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the...
SourceID springer
proquest
SourceType Publisher
StartPage 510
SubjectTerms Data generation
Deep learning
Differential privacy
Title Privacy Preserving Synthetic Data Release Using Deep Learning
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5921403&ppg=537
http://link.springer.com/10.1007/978-3-030-10925-7_31
Volume 11051
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgXBAHdrHLB27IyI2d7chSQGziAIibFS-RuBREAxJ8PTNJ3CzqBS5pG8WpO8-Zzoxn3hByaMHdQh4sFkqbMmlzy7SVhgmX51YLrrnA4uS7--jqSV6_hC9NF76yuqTQx-ZnZl3Jf1CFc4ArVsn-AdnpTeEEvAd84QgIw7Fn_HbDrHWHIUyDdJ4htSo1vPEhMqTVNJieWdb1nWdFhv9Xk_YKefh4_cJm75iFgRoDqbm_x2AQIocrjsBMOdy9OaryCs6de59-WztagAVKnWiBjxb24o2tkNfJZcfDFOgycXDSeEdlglU2nKmA2zkXMJTh2JDFqlb1Hb7rsKJ76fFdj07PwjRAGkElk0jBRfNkPk7CAVk4GV3fPk8jaAEyB4XYdLGZZEWp1HxuFUvOmlPHrejthJcGxuMKWcKiE4rVIDDLVTLnxmtk2bfcoLUGXiceNNqARqegUQSN1qDREjSKoFEP2gZ5uhg9nl2xugMGew-kKFgmBTwsqQWjMo2SIB2aPIGHiwcZ9uDTSa7BwQuk4RosMQsvYO6FmeDSWmtEZMUmGYzfxm6L0DixEY9d5nhqZJ6JlGtrhrFxqdAGrLptwrwoVLlPXycHm-qHT1QPlG1y5OWl8PKJ8gTYIGglFAhalYJWKOidP959lyw2q3ePDIqPT7cP1l-hD-pl8AvcV1TU
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases&rft.atitle=Privacy+Preserving+Synthetic+Data+Release+Using+Deep+Learning&rft.date=2019-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030109240&rft.volume=11051&rft_id=info:doi/10.1007%2F978-3-030-10925-7_31&rft.externalDBID=537&rft.externalDocID=EBC5921403_486_537
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5921403-l.jpg