Mean Estimation with User-level Privacy under Data Heterogeneity

A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data,...

Full description

Saved in:
Bibliographic Details
Main Authors Cummings, Rachel, Feldman, Vitaly, McMillan, Audra, Talwar, Kunal
Format Journal Article
LanguageEnglish
Published 28.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.
AbstractList A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.
Author Feldman, Vitaly
McMillan, Audra
Cummings, Rachel
Talwar, Kunal
Author_xml – sequence: 1
  givenname: Rachel
  surname: Cummings
  fullname: Cummings, Rachel
– sequence: 2
  givenname: Vitaly
  surname: Feldman
  fullname: Feldman, Vitaly
– sequence: 3
  givenname: Audra
  surname: McMillan
  fullname: McMillan, Audra
– sequence: 4
  givenname: Kunal
  surname: Talwar
  fullname: Talwar, Kunal
BackLink https://doi.org/10.48550/arXiv.2307.15835$$DView paper in arXiv
BookMark eNotj7FOwzAURT3AQAsfwIR_ICF-juNmKyqFIhXBUObo2XktllIHOSaQv8cUhqO7XF3dM2NnvvfE2LUo8nKhVHGL4duNOchC50ItpLpgy2dCz9dDdEeMrvf8y8V3_jZQyDoaqeOvwY1oJ_7pWwr8HiPyDUUK_YE8uThdsvM9dgNd_eec7R7Wu9Um2748Pq3uthlWWmVg5V4orEFUwrQIFiTI2litSyNNBYBtapTaitLYNmFqTQmUIlUNyDm7-Zs9KTQfIf0NU_Or0pxU5A-0rUXn
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.2307.15835
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2307_15835
GroupedDBID AKY
EPD
GOX
ID FETCH-LOGICAL-a675-2c3f15a92161bda2c23239bc774b3b622adc3f47c14bcd4bcb97eb97a31c23b23
IEDL.DBID GOX
IngestDate Mon Jan 08 05:48:55 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a675-2c3f15a92161bda2c23239bc774b3b622adc3f47c14bcd4bcb97eb97a31c23b23
OpenAccessLink https://arxiv.org/abs/2307.15835
ParticipantIDs arxiv_primary_2307_15835
PublicationCentury 2000
PublicationDate 2023-07-28
PublicationDateYYYYMMDD 2023-07-28
PublicationDate_xml – month: 07
  year: 2023
  text: 2023-07-28
  day: 28
PublicationDecade 2020
PublicationYear 2023
Score 1.8874836
SecondaryResourceType preprint
Snippet A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points....
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Cryptography and Security
Computer Science - Data Structures and Algorithms
Computer Science - Learning
Statistics - Machine Learning
Title Mean Estimation with User-level Privacy under Data Heterogeneity
URI https://arxiv.org/abs/2307.15835
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdZ07T8MwEIBPbScWBAJUnvLAakhs5-ENBC0RUoGhlbJFPtuVKqGCQlrBv-ecBMHC4Az2LbnTxV_sewBcShSxtbnlOnKSK5emHJH8SqpMLYV1uWlbJ8ye0mKhHsukHAD7yYUx9edq29UHxo_rEKV8FSdECUMYChFCth6ey-5ysi3F1cv_yhFjtlN_NonpHuz2dMduO3Psw8CvD-Bm5s2aTciXujRBFs4-2YJsz19DyA57qVdbY79YyOeq2b1pDCtClMobGdcTJR_CfDqZ3xW8b1zADfE3F1Yu48RoQTSFzghL1CI1WiItlJgKYRxJqMzGCq2jgTrzNIyMSRSFPIIR_fv7MTBl6fuTORNpK5VTGkPCY-SJaVDl9DyGcfu61XtXm6IKmqhaTZz8v3QKO6FrejiiFPkZjJp6489pb23wolXwN739eac
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mean+Estimation+with+User-level+Privacy+under+Data+Heterogeneity&rft.au=Cummings%2C+Rachel&rft.au=Feldman%2C+Vitaly&rft.au=McMillan%2C+Audra&rft.au=Talwar%2C+Kunal&rft.date=2023-07-28&rft_id=info:doi/10.48550%2Farxiv.2307.15835&rft.externalDocID=2307_15835