Generating Private Synthetic Data with Genetic Algorithms
We study the problem of efficiently generating differentially private synthetic data that approximate the statistical properties of an underlying sensitive dataset. In recent years, there has been a growing line of work that approaches this problem using first-order optimization techniques. However,...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
05.06.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | We study the problem of efficiently generating differentially private
synthetic data that approximate the statistical properties of an underlying
sensitive dataset. In recent years, there has been a growing line of work that
approaches this problem using first-order optimization techniques. However,
such techniques are restricted to optimizing differentiable objectives only,
severely limiting the types of analyses that can be conducted. For example,
first-order mechanisms have been primarily successful in approximating
statistical queries only in the form of marginals for discrete data domains. In
some cases, one can circumvent such issues by relaxing the task's objective to
maintain differentiability. However, even when possible, these approaches
impose a fundamental limitation in which modifications to the minimization
problem become additional sources of error. Therefore, we propose Private-GSD,
a private genetic algorithm based on zeroth-order optimization heuristics that
do not require modifying the original objective. As a result, it avoids the
aforementioned limitations of first-order optimization. We empirically evaluate
Private-GSD against baseline algorithms on data derived from the American
Community Survey across a variety of statistics--otherwise known as statistical
queries--both for discrete and real-valued attributes. We show that Private-GSD
outperforms the state-of-the-art methods on non-differential queries while
matching accuracy in approximating differentiable ones. |
---|---|
AbstractList | We study the problem of efficiently generating differentially private
synthetic data that approximate the statistical properties of an underlying
sensitive dataset. In recent years, there has been a growing line of work that
approaches this problem using first-order optimization techniques. However,
such techniques are restricted to optimizing differentiable objectives only,
severely limiting the types of analyses that can be conducted. For example,
first-order mechanisms have been primarily successful in approximating
statistical queries only in the form of marginals for discrete data domains. In
some cases, one can circumvent such issues by relaxing the task's objective to
maintain differentiability. However, even when possible, these approaches
impose a fundamental limitation in which modifications to the minimization
problem become additional sources of error. Therefore, we propose Private-GSD,
a private genetic algorithm based on zeroth-order optimization heuristics that
do not require modifying the original objective. As a result, it avoids the
aforementioned limitations of first-order optimization. We empirically evaluate
Private-GSD against baseline algorithms on data derived from the American
Community Survey across a variety of statistics--otherwise known as statistical
queries--both for discrete and real-valued attributes. We show that Private-GSD
outperforms the state-of-the-art methods on non-differential queries while
matching accuracy in approximating differentiable ones. |
Author | Vietri, Giuseppe Liu, Terrance Tang, Jingwu Wu, Zhiwei Steven |
Author_xml | – sequence: 1 givenname: Terrance surname: Liu fullname: Liu, Terrance – sequence: 2 givenname: Jingwu surname: Tang fullname: Tang, Jingwu – sequence: 3 givenname: Giuseppe surname: Vietri fullname: Vietri, Giuseppe – sequence: 4 givenname: Zhiwei Steven surname: Wu fullname: Wu, Zhiwei Steven |
BackLink | https://doi.org/10.48550/arXiv.2306.03257$$DView paper in arXiv |
BookMark | eNotj0FOwzAURL2ABZQegBW-QMKPnW8ny6pAQaoEUruPJonTWmpd5FqF3h5SWI30NBrNuxVX4RCcEPcF5WXFTI-I3_6UK00mJ63Y3oh64YKLSD5s5Ef0JyQnV-eQti75Tj4hQX75tJVjbSSz3eYQf8H-eCeuB-yObvqfE7F-eV7PX7Pl--JtPltmMNZmqFGCFbUwLVe96oiroetRGmLLhutKlaxhwU5p1ox26AtLtWmJCmjSE_HwN3v53nxGv0c8N6NDc3HQP1K-Qjg |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2306.03257 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2306_03257 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a677-a9a4a520ba6b58d2c058fcda46057565982453a7a5e23535abfd17096b001a303 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:39:57 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a677-a9a4a520ba6b58d2c058fcda46057565982453a7a5e23535abfd17096b001a303 |
OpenAccessLink | https://arxiv.org/abs/2306.03257 |
ParticipantIDs | arxiv_primary_2306_03257 |
PublicationCentury | 2000 |
PublicationDate | 2023-06-05 |
PublicationDateYYYYMMDD | 2023-06-05 |
PublicationDate_xml | – month: 06 year: 2023 text: 2023-06-05 day: 05 |
PublicationDecade | 2020 |
PublicationYear | 2023 |
Score | 1.8823817 |
SecondaryResourceType | preprint |
Snippet | We study the problem of efficiently generating differentially private
synthetic data that approximate the statistical properties of an underlying
sensitive... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Cryptography and Security Computer Science - Learning Computer Science - Neural and Evolutionary Computing |
Title | Generating Private Synthetic Data with Genetic Algorithms |
URI | https://arxiv.org/abs/2306.03257 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_ycFrNE2aTXIsai2CH2CF3spkk1VBa9lW0X_vJFvRi9fJXDIJvPcykxmAY-e8rbz1vFRekEDRgluFyB2hk6-sUZi_i13fFKOH_tVET1rAfv7CYP35_NH0B_aL08SPT4Sia9WGtpSpZOvydtIkJ3MrrpX_rx9xzGz6AxLDDVhfsTs2aI5jE1pxtgWuae2c6ovZXZ2miUV2_zUj5kVe7ByXyNJrKEtuyTJ4eXwjyf70utiG8fBifDbiq4kFHAtjODrso5bCY-G1DbIU2lZlwJR7NMScnJV9rdCgjlJppdFXoWdIRCTqggQmO9Ah0R-7wESwpYwqJP1GONPDKCpjg5IxSofB7EI373M6b5pSTFMIpjkEe_8v7cNaGpeeS530AXSW9Xs8JFBd-qMc2W_9g3Xc |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Generating+Private+Synthetic+Data+with+Genetic+Algorithms&rft.au=Liu%2C+Terrance&rft.au=Tang%2C+Jingwu&rft.au=Vietri%2C+Giuseppe&rft.au=Wu%2C+Zhiwei+Steven&rft.date=2023-06-05&rft_id=info:doi/10.48550%2Farxiv.2306.03257&rft.externalDocID=2306_03257 |