High Epsilon Synthetic Data Vulnerabilities in MST and PrivBayes
Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed i...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
09.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed in recent years to improve and balance both of these aims. Many of these algorithms provide robust differential privacy guarantees. However, we show here that if the differential privacy parameter \(\varepsilon\) is set too high, then unambiguous privacy leakage can result. We show this by conducting a novel membership inference attack (MIA) on two state-of-the-art differentially private SDG algorithms: MST and PrivBayes. Our work suggests that there are vulnerabilities in these generators not previously seen, and that future work to strengthen their privacy is advisable. We present the heuristic for our MIA here. It assumes knowledge of auxiliary "population" data, and also assumes knowledge of which SDG algorithm was used. We use this information to adapt the recent DOMIAS MIA uniquely to MST and PrivBayes. Our approach went on to win the SNAKE challenge in November 2023. |
---|---|
AbstractList | Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its underlying training data, while excluding any personally identifiable information. There have been a whole host of SDG algorithms developed in recent years to improve and balance both of these aims. Many of these algorithms provide robust differential privacy guarantees. However, we show here that if the differential privacy parameter \(\varepsilon\) is set too high, then unambiguous privacy leakage can result. We show this by conducting a novel membership inference attack (MIA) on two state-of-the-art differentially private SDG algorithms: MST and PrivBayes. Our work suggests that there are vulnerabilities in these generators not previously seen, and that future work to strengthen their privacy is advisable. We present the heuristic for our MIA here. It assumes knowledge of auxiliary "population" data, and also assumes knowledge of which SDG algorithm was used. We use this information to adapt the recent DOMIAS MIA uniquely to MST and PrivBayes. Our approach went on to win the SNAKE challenge in November 2023. |
Author | Golob, Steven Maratkhan, Anuar Pentyala, Sikha De Cock, Martine |
Author_xml | – sequence: 1 givenname: Steven surname: Golob fullname: Golob, Steven – sequence: 2 givenname: Sikha surname: Pentyala fullname: Pentyala, Sikha – sequence: 3 givenname: Anuar surname: Maratkhan fullname: Maratkhan, Anuar – sequence: 4 givenname: Martine surname: De Cock fullname: De Cock, Martine |
BookMark | eNqNyr0KwjAUQOEgClbtO1xwLtQb--MmaqWLILS4lqjRpoSb2qRC314HH8DpDN-ZsTEZkiPmIeerIF0jTplvbROGIcYJRhH32DZXzxqy1iptCIqBXC2dusFBOAGXXpPsxFVp5ZS0oAhORQmC7nDu1HsnBmkXbPIQ2kr_1zlbHrNynwdtZ169tK5qTN_RlyrcYJTEq5Qj_-_6ADoIOjE |
ContentType | Paper |
Copyright | 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Engineering Collection Engineering Database Access via ProQuest (Open Access) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_29257618323 |
IEDL.DBID | BENPR |
IngestDate | Thu Oct 10 17:46:59 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_29257618323 |
OpenAccessLink | https://www.proquest.com/docview/2925761832?pq-origsite=%requestingapplication% |
PQID | 2925761832 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2925761832 |
PublicationCentury | 2000 |
PublicationDate | 20240209 |
PublicationDateYYYYMMDD | 2024-02-09 |
PublicationDate_xml | – month: 02 year: 2024 text: 20240209 day: 09 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2024 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.5278685 |
SecondaryResourceType | preprint |
Snippet | Synthetic data generation (SDG) has become increasingly popular as a privacy-enhancing technology. It aims to maintain important statistical properties of its... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Algorithms Privacy Synthetic data |
Title | High Epsilon Synthetic Data Vulnerabilities in MST and PrivBayes |
URI | https://www.proquest.com/docview/2925761832 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED7ciuCbP_HHHAF9DdrmljZPyrR1CB3FTdnbaGoKhdF1bSfsxb_dpHT6IOwxBBKSHHf3ffeRA7j17FRymwuaaHBCcZB6NL53UmqjFK5yByhTw3eEYz56x9fZYNYSblUrq9z6xMZRfy4Tw5HfOcKkxsYAH4oVNV2jTHW1baHRAcux0ZRpraE_jt5-WRaHuzpnZv8cbRM9gkOworhQ5RHsqfwY9hvRZVKdwKMRWRC_qLLFMieTTa6TMf2O5DmuY_KxXpgPoRvtqkazJMtJOJkSDfxJVGZfw3ijqlO4Cfzp04hut523plHN_w7CzqCrMb46ByIQ0UFPoS1dVIknOWOMa3DBkEmG4gJ6u1a63D19BQeOjsWN2Fj0oFuXa3WtY2kt-9Dxgpd-e216FH77P4dLfNw |
link.rule.ids | 786,790,12792,21416,33406,33777,43633,43838 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fS8MwED50Q_TNn_hjakBfg7a5pc2Toq5WXcdgVfZWmppCYXS17YT99yal0wdhz4GEuxx39335yAFcu1YqucUFTTQ4odhPXRrf2im1UApHOX2UqeE7ghH33_F12p-2hFvVyipXObFJ1J_zxHDkN7YwrbEJwLvii5qpUeZ1tR2hsQldZJyZOHe951-OxeaO7pjZvzTb1A5vF7rjuFDlHmyofB-2GsllUh3AvZFYkEFRZbN5TibLXLdi-hbJU1zH5GMxM99BN8pVjWVJlpNgEhIN-8m4zL4f4qWqDuHKG4SPPl0dG7WBUUV_ZrAj6GiEr46BCES00VVoSQdV4kptCuMaWjBkkqE4gd66nU7XL1_Cth8Gw2j4Mno7gx1bV-VGdix60KnLhTrXVbWWF43rfgCBzHxM |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=High+Epsilon+Synthetic+Data+Vulnerabilities+in+MST+and+PrivBayes&rft.jtitle=arXiv.org&rft.au=Golob%2C+Steven&rft.au=Pentyala%2C+Sikha&rft.au=Maratkhan%2C+Anuar&rft.au=De+Cock%2C+Martine&rft.date=2024-02-09&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |