Federated statistical analysis: non-parametric testing and quantile estimation
The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be...
Saved in:
Published in | Frontiers in applied mathematics and statistics Vol. 9 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Frontiers Media S.A
13.11.2023
|
Subjects | |
Online Access | Get full text |
ISSN | 2297-4687 2297-4687 |
DOI | 10.3389/fams.2023.1267034 |
Cover
Abstract | The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the
K
-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency. |
---|---|
AbstractList | The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the
K
-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency. The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful statistical analyses and enhances the reliability of conclusions, which can be based on a broad collection of subjects. Often such data sets can be assembled only with access to diverse sources; for example, medical research that combines data from multiple centers in a federated analysis. However these hopes must be balanced against data privacy concerns, which hinder sharing raw data among centers. Consequently, federated analyses typically resort to sharing data summaries from each center. The limitation to summaries carries the risk that it will impair the efficiency of statistical analysis procedures. In this work, we take a close look at the effects of federated analysis on two very basic problems, non-parametric comparison of two groups and quantile estimation to describe the corresponding distributions. We also propose a specific privacy-preserving data release policy for federated analysis with the K-anonymity criterion, which has been adopted by the Medical Informatics Platform of the European Human Brain Project. Our results show that, for our tasks, there is only a modest loss of statistical efficiency. |
Author | Marcus-Kalish, Mira Steinberg, David M. Becher, Ori |
Author_xml | – sequence: 1 givenname: Ori surname: Becher fullname: Becher, Ori – sequence: 2 givenname: Mira surname: Marcus-Kalish fullname: Marcus-Kalish, Mira – sequence: 3 givenname: David M. surname: Steinberg fullname: Steinberg, David M. |
BookMark | eNpNkM1OwzAQhC1UJErpA3DLCyT4L7HNDVUUKlVwgbO1cdZVqjQptjn07UlohTjtajQ7o_1uyawfeiTkntFCCG0ePBxiwSkXBeOVokJekTnnRuWy0mr2b78hyxj3lFKmlTZKzcnbGhsMkLDJYoLUxtQ66DLooTvFNj5mY1V-hAAHTKF1WcLR0e9GQ5N9fUOf2g6zSTuMx0N_R649dBGXl7kgn-vnj9Vrvn1_2ayetrkTtEq5lDXWQnunpEZT1ehLWYKmjTFcMZTINdZKe1DCVU7VjjpwCnStSuYaJsWCbM65zQB7ewxjfTjZAVr7KwxhZyGMr3RoGWUgwPPKVFqWJQXFQPJSeOMo1SUbs9g5y4UhxoD-L49RO_G1E1878bUXvuIH9i5xRQ |
Cites_doi | 10.1109/2Ftkde.2021.3124599 10.1007/s00259-022-06053-8 10.1101/2020.06.05.136382 10.1109/MSP.2020.2975749 10.1111/j.2517-6161.1964.tb00553.x 10.1016/S0735-1097(21)04724-0 10.48550/ARXIV.1602.05629 10.1038/s41467-022-33407-5 10.48550/ARXIV.1902.04885 10.1093/jamia/ocz199 10.1093/imaiai/iaw013 10.1214/aoms/1177730491 10.1561/2200000083 10.1038/s41591-022-02155-w 10.1002/clc.24006 10.1371/journal.pdig.0000101 10.1093/biomet/87.4.954 |
ContentType | Journal Article |
DBID | AAYXX CITATION DOA |
DOI | 10.3389/fams.2023.1267034 |
DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISSN | 2297-4687 |
ExternalDocumentID | oai_doaj_org_article_101a3af269684550a71a4253f9c00851 10_3389_fams_2023_1267034 |
GroupedDBID | 5VS 9T4 AAFWJ AAYXX ACGFS ACXDI ADBBV AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ KQ8 M~E OK1 |
ID | FETCH-LOGICAL-c306t-44beb38fc748e96bef545a80d99271e4e28eb78fa73c6c7bc0cac7a8b751cd143 |
IEDL.DBID | DOA |
ISSN | 2297-4687 |
IngestDate | Wed Aug 27 01:20:28 EDT 2025 Tue Jul 01 00:48:27 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c306t-44beb38fc748e96bef545a80d99271e4e28eb78fa73c6c7bc0cac7a8b751cd143 |
OpenAccessLink | https://doaj.org/article/101a3af269684550a71a4253f9c00851 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_101a3af269684550a71a4253f9c00851 crossref_primary_10_3389_fams_2023_1267034 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-11-13 |
PublicationDateYYYYMMDD | 2023-11-13 |
PublicationDate_xml | – month: 11 year: 2023 text: 2023-11-13 day: 13 |
PublicationDecade | 2020 |
PublicationTitle | Frontiers in applied mathematics and statistics |
PublicationYear | 2023 |
Publisher | Frontiers Media S.A |
Publisher_xml | – name: Frontiers Media S.A |
References | Li (B14) 2020; 37 B20 Ogier du Terrail (B5) 2023; 29 Box (B26) 1964; 26 Duan (B19) 2019 Spath (B21) 2022; 1 Shiri (B2) 2023; 50 Nasirigerdeh (B17) 2020 Kaplan (B24) 2022 Rosenblatt (B22) 2016; 53 Mann (B9) 1947; 18 Fisher (B23) 1932 Kairouz (B12) 2021; 14 Hwang (B16) 2023 Dwork (B8) 2006 Samarati (B7) 1998 Pati (B4) 2022; 13 Li (B11) 2021; 35 Duan (B18) 2019; 27 Annie (B3) 2021 McMahan (B13) 2016 Yeo (B25) 2000; 87 B6 Yang (B10) 2019 Li (B15) 2021 Proietti (B1) 2023; 46 |
References_xml | – volume: 35 start-page: 3347 year: 2021 ident: B11 article-title: A survey on federated learning systems: vision, hype and reality for data privacy and protection publication-title: IEEE Trans Knowledge Data Eng. doi: 10.1109/2Ftkde.2021.3124599 – volume: 50 start-page: 1034 year: 2023 ident: B2 article-title: Decentralized collaborative multi-institutional PET attenuation and scatter correction using federated deep learning publication-title: Eur J Nuclear Med Mol Imaging doi: 10.1007/s00259-022-06053-8 – year: 2020 ident: B17 article-title: sPLINK: a federated, privacy-preserving tool as a robust alternative to meta-analysis in genome-wide association studies publication-title: bioRxiv doi: 10.1101/2020.06.05.136382 – volume: 37 start-page: 5060 year: 2020 ident: B14 article-title: Federated learning: challenges, methods, and future directions publication-title: IEEE Signal Process Magaz. doi: 10.1109/MSP.2020.2975749 – ident: B20 – year: 2021 ident: B15 article-title: Fed{bn}: federated learning on non-{iid} features via local batch normalization publication-title: International Conference on Learning Representations – volume: 26 start-page: 211 year: 1964 ident: B26 article-title: An analysis of transformations publication-title: J R Stat Soc Ser B Methodol doi: 10.1111/j.2517-6161.1964.tb00553.x – year: 2021 ident: B3 article-title: Effect of sex differences in TAVR mortality using a federated database publication-title: J Am Coll Cardiol. doi: 10.1016/S0735-1097(21)04724-0 – start-page: 10751 year: 2022 ident: B24 article-title: Differentially private approximate quantiles publication-title: Proceedings of the 39th International Conference on Machine Learning – year: 2016 ident: B13 article-title: Communication-efficient learning of deep networks from decentralized data publication-title: arxiv preprint arxiv:1602.05629 doi: 10.48550/ARXIV.1602.05629 – volume: 13 start-page: 7346 year: 2022 ident: B4 article-title: Federated learning enables big data for rare cancer boundary detection publication-title: Nat Commun. doi: 10.1038/s41467-022-33407-5 – start-page: 30 year: 2019 ident: B19 article-title: ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites publication-title: Pacific Symposium on Biocomputing – year: 2019 ident: B10 article-title: Federated machine learning: concept and applications publication-title: arxiv preprint arxiv:1902.04885 doi: 10.48550/ARXIV.1902.04885 – year: 1932 ident: B23 publication-title: Statistical Methods for Research Workers – volume: 27 start-page: 376 year: 2019 ident: B18 article-title: Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm publication-title: J Am Med Inform Assoc doi: 10.1093/jamia/ocz199 – volume: 53 start-page: 79 year: 2016 ident: B22 article-title: On the optimality of averaging in distributed statistical learning publication-title: Inform Inference doi: 10.1093/imaiai/iaw013 – volume: 18 start-page: 50 year: 1947 ident: B9 article-title: On a test of whether one of two random variables is stochastically larger than the other publication-title: Ann Math Stat. doi: 10.1214/aoms/1177730491 – volume: 14 start-page: 1 year: 2021 ident: B12 article-title: Advances and open problems in federated learning publication-title: Found Trends Mach Learn. doi: 10.1561/2200000083 – start-page: 163 volume-title: Proceedings of the Conference on Health, Inference, and Learning year: 2023 ident: B16 article-title: Towards the practical utility of federated learning in the medical domain – volume: 29 start-page: 135 year: 2023 ident: B5 article-title: Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer publication-title: Nat Med. doi: 10.1038/s41591-022-02155-w – volume: 46 start-page: 656 year: 2023 ident: B1 article-title: Clinical implications of different types of dementia in patients with atrial fibrillation: insights from a global federated health network analysis publication-title: Clin Cardiol. doi: 10.1002/clc.24006 – ident: B6 – year: 1998 ident: B7 publication-title: Protecting Privacy When Disclosing Information: k-Anonymity and Its Enforcement Through Generalization and Suppression – volume: 1 start-page: e0000101 year: 2022 ident: B21 article-title: Privacy-aware multi-institutional time-to-event studies publication-title: PLoS Digit Health doi: 10.1371/journal.pdig.0000101 – start-page: 1 year: 2006 ident: B8 article-title: Differential privacy publication-title: Automata, Languages and Programming – volume: 87 start-page: 954 year: 2000 ident: B25 article-title: A new family of power transformations to improve normality or symmetry publication-title: Biometrika. doi: 10.1093/biomet/87.4.954 |
SSID | ssj0001878977 |
Score | 2.2447643 |
Snippet | The age of big data has fueled expectations for accelerating learning. The availability of large data sets enables researchers to achieve more powerful... |
SourceID | doaj crossref |
SourceType | Open Website Index Database |
SubjectTerms | federated analysis information loss Mann-Whitney test medical informatics privacy preservation |
Title | Federated statistical analysis: non-parametric testing and quantile estimation |
URI | https://doaj.org/article/101a3af269684550a71a4253f9c00851 |
Volume | 9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxbeiPKSByYkgx27sc0GiKpCohOVukX22WFqxSP9_71LAioTC6vlWKfPd_F3ie87xq6cjGUJNovoyiSMNSACRCMMxKSoSaBtm_a9TMvJzDzPR_ONVl90J6yTB-6Aw6hWQYe6IBEXqsANVgX0M117aOkCvX2llxvJVPt1xVmHzKb7jYlZmL-tw4LUuQt9o4oS3dz8Oog29Prbg2W8x3Z6RsjvO0v22VZeHrDdnh3yPva-Dtl0TLoPSA0TpyqgVmAZnwu9qsgdx0RekJL3gppkAW9IP2P5hhMS_1ghgBj_nMa6YsUjNhs_vT5ORN8NQQDS-kYYEzHxdTVY47IvY66R_AQnk_eFVdnkwuVoXR2sBkQ_goQANrhoRwoS0qJjNkBD8gnjSHIAiVKWZSxMCjHK7DUuWEgdHfjRkF1_Q1O9d6IXFSYLhGNFOFaEY9XjOGQPBN7PRNKrbgdwF6t-F6u_dvH0PxY5Y9tkGFUKKn3OBs3nKl8gZWjiZesda0avvWk |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Federated+statistical+analysis%3A+non-parametric+testing+and+quantile+estimation&rft.jtitle=Frontiers+in+applied+mathematics+and+statistics&rft.au=Ori+Becher&rft.au=Mira+Marcus-Kalish&rft.au=David+M.+Steinberg&rft.date=2023-11-13&rft.pub=Frontiers+Media+S.A&rft.eissn=2297-4687&rft.volume=9&rft_id=info:doi/10.3389%2Ffams.2023.1267034&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_101a3af269684550a71a4253f9c00851 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2297-4687&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2297-4687&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2297-4687&client=summon |