Privacy Preserving Synthetic Data Release Using Deep Learning
For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymiz...
Saved in:
Published in | Machine Learning and Knowledge Discovery in Databases Vol. 11051; pp. 510 - 526 |
---|---|
Main Authors | , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2019
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymization approaches do not provide rigorous privacy guarantees. Although, there are existing synthetic data generation techniques that use rigorous definitions of differential privacy, to our knowledge, these techniques have not been compared extensively using different utility metrics.
In this work, we provide two novel contributions. First, we compare existing techniques on different datasets using different utility metrics. Second, we present a novel approach that utilizes deep learning techniques coupled with an efficient analysis of privacy costs to generate differentially private synthetic datasets with higher data utility. We show that we can learn deep learning models that can capture relationship among multiple features, and then use these models to generate differentially private synthetic datasets. Our extensive experimental evaluation conducted on multiple datasets indicates that our proposed approach is more robust (i.e., one of the top performing technique in almost all type of data we have experimented) compared to the state-of-the art methods in terms of various data utility measures. Code related to this paper is available at: https://github.com/ncabay/synthetic_generation. |
---|---|
AbstractList | For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the years, data anonymization and synthetic data generation techniques have been proposed to address this challenge. Unfortunately, data anonymization approaches do not provide rigorous privacy guarantees. Although, there are existing synthetic data generation techniques that use rigorous definitions of differential privacy, to our knowledge, these techniques have not been compared extensively using different utility metrics.
In this work, we provide two novel contributions. First, we compare existing techniques on different datasets using different utility metrics. Second, we present a novel approach that utilizes deep learning techniques coupled with an efficient analysis of privacy costs to generate differentially private synthetic datasets with higher data utility. We show that we can learn deep learning models that can capture relationship among multiple features, and then use these models to generate differentially private synthetic datasets. Our extensive experimental evaluation conducted on multiple datasets indicates that our proposed approach is more robust (i.e., one of the top performing technique in almost all type of data we have experimented) compared to the state-of-the art methods in terms of various data utility measures. Code related to this paper is available at: https://github.com/ncabay/synthetic_generation. |
Author | Thuraisingham, Bhavani Sweeney, Latanya Zhou, Yan Kantarcioglu, Murat Abay, Nazmiye Ceren |
Author_xml | – sequence: 1 givenname: Nazmiye Ceren orcidid: 0000-0002-7930-3455 surname: Abay fullname: Abay, Nazmiye Ceren email: nca150130@utdallas.edu – sequence: 2 givenname: Yan orcidid: 0000-0001-6122-7362 surname: Zhou fullname: Zhou, Yan – sequence: 3 givenname: Murat orcidid: 0000-0001-9795-9063 surname: Kantarcioglu fullname: Kantarcioglu, Murat – sequence: 4 givenname: Bhavani orcidid: 0000-0003-3776-3362 surname: Thuraisingham fullname: Thuraisingham, Bhavani – sequence: 5 givenname: Latanya orcidid: 0000-0003-3610-8892 surname: Sweeney fullname: Sweeney, Latanya |
BookMark | eNpVUMtOwzAQNFAQbekfcMgPGNZeJ44PHFDLS6pEBfRsOcmWFqok2KFS_x6HcuG0uzOa3Z0ZsUHd1MTYpYArAaCvjc45ckDgAoxMubYojtgkwhjBX0wfs6HIhOCIypz84xQM2DD2khut8IyNBOSIoKXIztkkhA8AkEJkaaqH7GbhNztX7pOFp0B-t6nfk9d93a2p25TJzHUueaEtuUDJMvTkjKhN5uR8HacLdrpy20CTvzpmy_u7t-kjnz8_PE1v57yVCjvuFBaApsoNmiyXRpSrXCsJ0gEgFvmqABX_LqGQUlWx5FKmDkFVVVViVuGYycPe0Pp4lrwtmuYzWAG2D8xG9xZt9Gx_w7F9YFGkDqLWN1_fFDpLvaqkuvNuW65d25EPNjVSKECr8symqPEHb_Roxg |
ContentType | Book Chapter |
Copyright | Springer Nature Switzerland AG 2019 |
Copyright_xml | – notice: Springer Nature Switzerland AG 2019 |
DBID | FFUUA |
DEWEY | 6.31 |
DOI | 10.1007/978-3-030-10925-7_31 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9783030109257 3030109259 |
EISSN | 1611-3349 |
Editor | Berlingerio, Michele Ifrim, Georgiana Hurley, Neil Gärtner, Thomas Bonchi, Francesco |
Editor_xml | – sequence: 1 fullname: Berlingerio, Michele – sequence: 2 fullname: Ifrim, Georgiana – sequence: 3 fullname: Gärtner, Thomas – sequence: 4 fullname: Bonchi, Francesco – sequence: 5 fullname: Hurley, Neil |
EndPage | 526 |
ExternalDocumentID | EBC5921403_486_537 |
GroupedDBID | 0D6 0DA 38. AABBV AEDXK AEJLV AEKFX AEZAY AIFIR ALEXF ALMA_UNASSIGNED_HOLDINGS AYMPB BBABE CXBFT CZZ EXGDT FCSXQ FFUUA I4C IEZ MGZZY NSQWD OORQV SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02 |
ID | FETCH-LOGICAL-p243t-a43b039d893968291cf874202a0033b8fb04924c0b224dc0b8225a304dddc36d3 |
ISBN | 9783030109240 3030109240 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:13:46 EDT 2025 Mon Apr 07 21:44:09 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | Q334-342 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p243t-a43b039d893968291cf874202a0033b8fb04924c0b224dc0b8225a304dddc36d3 |
OCLC | 1083307216 |
ORCID | 0000-0001-6122-7362 0000-0002-7930-3455 0000-0003-3776-3362 0000-0003-3610-8892 0000-0001-9795-9063 |
PQID | EBC5921403_486_537 |
PageCount | 17 |
ParticipantIDs | springer_books_10_1007_978_3_030_10925_7_31 proquest_ebookcentralchapters_5921403_486_537 |
PublicationCentury | 2000 |
PublicationDate | 2019 |
PublicationDateYYYYMMDD | 2019-01-01 |
PublicationDate_xml | – year: 2019 text: 2019 |
PublicationDecade | 2010 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part I |
PublicationTitle | Machine Learning and Knowledge Discovery in Databases |
PublicationYear | 2019 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Hutchison, David Tygar, Doug |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug |
SSID | ssj0002116557 ssj0002792 |
Score | 2.329001 |
Snippet | For many critical applications ranging from health care to social sciences, releasing personal data while protecting individual privacy is paramount. Over the... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 510 |
SubjectTerms | Data generation Deep learning Differential privacy |
Title | Privacy Preserving Synthetic Data Release Using Deep Learning |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5921403&ppg=537 http://link.springer.com/10.1007/978-3-030-10925-7_31 |
Volume | 11051 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELWgXBAHdrHLB27IyI2d7chSQGziAIibFS-RuBREAxJ8PTNJ3CzqBS5pG8WpO8-Zzoxn3hByaMHdQh4sFkqbMmlzy7SVhgmX51YLrrnA4uS7--jqSV6_hC9NF76yuqTQx-ZnZl3Jf1CFc4ArVsn-AdnpTeEEvAd84QgIw7Fn_HbDrHWHIUyDdJ4htSo1vPEhMqTVNJieWdb1nWdFhv9Xk_YKefh4_cJm75iFgRoDqbm_x2AQIocrjsBMOdy9OaryCs6de59-WztagAVKnWiBjxb24o2tkNfJZcfDFOgycXDSeEdlglU2nKmA2zkXMJTh2JDFqlb1Hb7rsKJ76fFdj07PwjRAGkElk0jBRfNkPk7CAVk4GV3fPk8jaAEyB4XYdLGZZEWp1HxuFUvOmlPHrejthJcGxuMKWcKiE4rVIDDLVTLnxmtk2bfcoLUGXiceNNqARqegUQSN1qDREjSKoFEP2gZ5uhg9nl2xugMGew-kKFgmBTwsqQWjMo2SIB2aPIGHiwcZ9uDTSa7BwQuk4RosMQsvYO6FmeDSWmtEZMUmGYzfxm6L0DixEY9d5nhqZJ6JlGtrhrFxqdAGrLptwrwoVLlPXycHm-qHT1QPlG1y5OWl8PKJ8gTYIGglFAhalYJWKOidP959lyw2q3ePDIqPT7cP1l-hD-pl8AvcV1TU |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Machine+Learning+and+Knowledge+Discovery+in+Databases&rft.atitle=Privacy+Preserving+Synthetic+Data+Release+Using+Deep+Learning&rft.date=2019-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783030109240&rft.volume=11051&rft_id=info:doi/10.1007%2F978-3-030-10925-7_31&rft.externalDBID=537&rft.externalDocID=EBC5921403_486_537 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5921403-l.jpg |