Federated statistical analysis: non-parametric testing and quantile estimation

The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in applied mathematics and statistics Vol. 9
Main Authors Becher, Ori, Marcus-Kalish, Mira, Steinberg, David M.
Format Journal Article
LanguageEnglish
Published Frontiers Media S.A 13.11.2023
Subjects
Online AccessGet full text
ISSN2297-4687
2297-4687
DOI10.3389/fams.2023.1267034

Cover

Abstract The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K -anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.
AbstractList The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K -anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.
The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency.
Author Marcus-Kalish, Mira
Steinberg, David M.
Becher, Ori
Author_xml – sequence: 1
  givenname: Ori
  surname: Becher
  fullname: Becher, Ori
– sequence: 2
  givenname: Mira
  surname: Marcus-Kalish
  fullname: Marcus-Kalish, Mira
– sequence: 3
  givenname: David M.
  surname: Steinberg
  fullname: Steinberg, David M.
BookMark eNpNkM1OwzAQhC1UJErpA3DLCyT4L7HNDVUUKlVwgbO1cdZVqjQptjn07UlohTjtajQ7o_1uyawfeiTkntFCCG0ePBxiwSkXBeOVokJekTnnRuWy0mr2b78hyxj3lFKmlTZKzcnbGhsMkLDJYoLUxtQ66DLooTvFNj5mY1V-hAAHTKF1WcLR0e9GQ5N9fUOf2g6zSTuMx0N_R649dBGXl7kgn-vnj9Vrvn1_2ayetrkTtEq5lDXWQnunpEZT1ehLWYKmjTFcMZTINdZKe1DCVU7VjjpwCnStSuYaJsWCbM65zQB7ewxjfTjZAVr7KwxhZyGMr3RoGWUgwPPKVFqWJQXFQPJSeOMo1SUbs9g5y4UhxoD-L49RO_G1E1878bUXvuIH9i5xRQ
Cites_doi 10.1109/2Ftkde.2021.3124599
10.1007/s00259-022-06053-8
10.1101/2020.06.05.136382
10.1109/MSP.2020.2975749
10.1111/j.2517-6161.1964.tb00553.x
10.1016/S0735-1097(21)04724-0
10.48550/ARXIV.1602.05629
10.1038/s41467-022-33407-5
10.48550/ARXIV.1902.04885
10.1093/jamia/ocz199
10.1093/imaiai/iaw013
10.1214/aoms/1177730491
10.1561/2200000083
10.1038/s41591-022-02155-w
10.1002/clc.24006
10.1371/journal.pdig.0000101
10.1093/biomet/87.4.954
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.3389/fams.2023.1267034
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISSN 2297-4687
ExternalDocumentID oai_doaj_org_article_101a3af269684550a71a4253f9c00851
10_3389_fams_2023_1267034
GroupedDBID 5VS
9T4
AAFWJ
AAYXX
ACGFS
ACXDI
ADBBV
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
KQ8
M~E
OK1
ID FETCH-LOGICAL-c306t-44beb38fc748e96bef545a80d99271e4e28eb78fa73c6c7bc0cac7a8b751cd143
IEDL.DBID DOA
ISSN 2297-4687
IngestDate Wed Aug 27 01:20:28 EDT 2025
Tue Jul 01 00:48:27 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c306t-44beb38fc748e96bef545a80d99271e4e28eb78fa73c6c7bc0cac7a8b751cd143
OpenAccessLink https://doaj.org/article/101a3af269684550a71a4253f9c00851
ParticipantIDs doaj_primary_oai_doaj_org_article_101a3af269684550a71a4253f9c00851
crossref_primary_10_3389_fams_2023_1267034
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-11-13
PublicationDateYYYYMMDD 2023-11-13
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-13
  day: 13
PublicationDecade 2020
PublicationTitle Frontiers in applied mathematics and statistics
PublicationYear 2023
Publisher Frontiers Media S.A
Publisher_xml – name: Frontiers Media S.A
References Li (B14) 2020; 37
B20
Ogier du Terrail (B5) 2023; 29
Box (B26) 1964; 26
Duan (B19) 2019
Spath (B21) 2022; 1
Shiri (B2) 2023; 50
Nasirigerdeh (B17) 2020
Kaplan (B24) 2022
Rosenblatt (B22) 2016; 53
Mann (B9) 1947; 18
Fisher (B23) 1932
Kairouz (B12) 2021; 14
Hwang (B16) 2023
Dwork (B8) 2006
Samarati (B7) 1998
Pati (B4) 2022; 13
Li (B11) 2021; 35
Duan (B18) 2019; 27
Annie (B3) 2021
McMahan (B13) 2016
Yeo (B25) 2000; 87
B6
Yang (B10) 2019
Li (B15) 2021
Proietti (B1) 2023; 46
References_xml – volume: 35
  start-page: 3347
  year: 2021
  ident: B11
  article-title: A survey on federated learning systems: vision, hype and reality for data privacy and protection
  publication-title: IEEE Trans Knowledge Data Eng.
  doi: 10.1109/2Ftkde.2021.3124599
– volume: 50
  start-page: 1034
  year: 2023
  ident: B2
  article-title: Decentralized collaborative multi-institutional PET attenuation and scatter correction using federated deep learning
  publication-title: Eur J Nuclear Med Mol Imaging
  doi: 10.1007/s00259-022-06053-8
– year: 2020
  ident: B17
  article-title: sPLINK: a federated, privacy-preserving tool as a robust alternative to meta-analysis in genome-wide association studies
  publication-title: bioRxiv
  doi: 10.1101/2020.06.05.136382
– volume: 37
  start-page: 5060
  year: 2020
  ident: B14
  article-title: Federated learning: challenges, methods, and future directions
  publication-title: IEEE Signal Process Magaz.
  doi: 10.1109/MSP.2020.2975749
– ident: B20
– year: 2021
  ident: B15
  article-title: Fed{bn}: federated learning on non-{iid} features via local batch normalization
  publication-title: International Conference on Learning Representations
– volume: 26
  start-page: 211
  year: 1964
  ident: B26
  article-title: An analysis of transformations
  publication-title: J R Stat Soc Ser B Methodol
  doi: 10.1111/j.2517-6161.1964.tb00553.x
– year: 2021
  ident: B3
  article-title: Effect of sex differences in TAVR mortality using a federated database
  publication-title: J Am Coll Cardiol.
  doi: 10.1016/S0735-1097(21)04724-0
– start-page: 10751
  year: 2022
  ident: B24
  article-title: Differentially private approximate quantiles
  publication-title: Proceedings of the 39th International Conference on Machine Learning
– year: 2016
  ident: B13
  article-title: Communication-efficient learning of deep networks from decentralized data
  publication-title: arxiv preprint arxiv:1602.05629
  doi: 10.48550/ARXIV.1602.05629
– volume: 13
  start-page: 7346
  year: 2022
  ident: B4
  article-title: Federated learning enables big data for rare cancer boundary detection
  publication-title: Nat Commun.
  doi: 10.1038/s41467-022-33407-5
– start-page: 30
  year: 2019
  ident: B19
  article-title: ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites
  publication-title: Pacific Symposium on Biocomputing
– year: 2019
  ident: B10
  article-title: Federated machine learning: concept and applications
  publication-title: arxiv preprint arxiv:1902.04885
  doi: 10.48550/ARXIV.1902.04885
– year: 1932
  ident: B23
  publication-title: Statistical Methods for Research Workers
– volume: 27
  start-page: 376
  year: 2019
  ident: B18
  article-title: Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm
  publication-title: J Am Med Inform Assoc
  doi: 10.1093/jamia/ocz199
– volume: 53
  start-page: 79
  year: 2016
  ident: B22
  article-title: On the optimality of averaging in distributed statistical learning
  publication-title: Inform Inference
  doi: 10.1093/imaiai/iaw013
– volume: 18
  start-page: 50
  year: 1947
  ident: B9
  article-title: On a test of whether one of two random variables is stochastically larger than the other
  publication-title: Ann Math Stat.
  doi: 10.1214/aoms/1177730491
– volume: 14
  start-page: 1
  year: 2021
  ident: B12
  article-title: Advances and open problems in federated learning
  publication-title: Found Trends Mach Learn.
  doi: 10.1561/2200000083
– start-page: 163
  volume-title: Proceedings of the Conference on Health, Inference, and Learning
  year: 2023
  ident: B16
  article-title: Towards the practical utility of federated learning in the medical domain
– volume: 29
  start-page: 135
  year: 2023
  ident: B5
  article-title: Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer
  publication-title: Nat Med.
  doi: 10.1038/s41591-022-02155-w
– volume: 46
  start-page: 656
  year: 2023
  ident: B1
  article-title: Clinical implications of different types of dementia in patients with atrial fibrillation: insights from a global federated health network analysis
  publication-title: Clin Cardiol.
  doi: 10.1002/clc.24006
– ident: B6
– year: 1998
  ident: B7
  publication-title: Protecting Privacy When Disclosing Information: k-Anonymity and Its Enforcement Through Generalization and Suppression
– volume: 1
  start-page: e0000101
  year: 2022
  ident: B21
  article-title: Privacy-aware multi-institutional time-to-event studies
  publication-title: PLoS Digit Health
  doi: 10.1371/journal.pdig.0000101
– start-page: 1
  year: 2006
  ident: B8
  article-title: Differential privacy
  publication-title: Automata, Languages and Programming
– volume: 87
  start-page: 954
  year: 2000
  ident: B25
  article-title: A new family of power transformations to improve normality or symmetry
  publication-title: Biometrika.
  doi: 10.1093/biomet/87.4.954
SSID ssj0001878977
Score 2.2447643
Snippet The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful...
SourceID doaj
crossref
SourceType Open Website
Index Database
SubjectTerms federated analysis
information loss
Mann-Whitney test
medical informatics
privacy preservation
Title Federated statistical analysis: non-parametric testing and quantile estimation
URI https://doaj.org/article/101a3af269684550a71a4253f9c00851
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxbeiPKSByYkgx27sc0GiKpCohOVukX22WFqxSP9_71LAioTC6vlWKfPd_F3ie87xq6cjGUJNovoyiSMNSACRCMMxKSoSaBtm_a9TMvJzDzPR_ONVl90J6yTB-6Aw6hWQYe6IBEXqsANVgX0M117aOkCvX2llxvJVPt1xVmHzKb7jYlZmL-tw4LUuQt9o4oS3dz8Oog29Prbg2W8x3Z6RsjvO0v22VZeHrDdnh3yPva-Dtl0TLoPSA0TpyqgVmAZnwu9qsgdx0RekJL3gppkAW9IP2P5hhMS_1ghgBj_nMa6YsUjNhs_vT5ORN8NQQDS-kYYEzHxdTVY47IvY66R_AQnk_eFVdnkwuVoXR2sBkQ_goQANrhoRwoS0qJjNkBD8gnjSHIAiVKWZSxMCjHK7DUuWEgdHfjRkF1_Q1O9d6IXFSYLhGNFOFaEY9XjOGQPBN7PRNKrbgdwF6t-F6u_dvH0PxY5Y9tkGFUKKn3OBs3nKl8gZWjiZesda0avvWk
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Federated+statistical+analysis%3A+non-parametric+testing+and+quantile+estimation&rft.jtitle=Frontiers+in+applied+mathematics+and+statistics&rft.au=Ori+Becher&rft.au=Mira+Marcus-Kalish&rft.au=David+M.+Steinberg&rft.date=2023-11-13&rft.pub=Frontiers+Media+S.A&rft.eissn=2297-4687&rft.volume=9&rft_id=info:doi/10.3389%2Ffams.2023.1267034&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_101a3af269684550a71a4253f9c00851
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2297-4687&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2297-4687&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2297-4687&client=summon