Federated Forest
Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situ...
Saved in:
Published in | IEEE transactions on big data Vol. 8; no. 3; pp. 843 - 854 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.06.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 2332-7790 2372-2096 |
DOI | 10.1109/TBDATA.2020.2992755 |
Cover
Abstract | Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this article, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest , which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions' clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as of the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks. |
---|---|
AbstractList | Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union’s General Data Protection Regulation (GDPR) and China’ Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this article, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest , which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions’ clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as of the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks. |
Author | Liu, Yingting Liu, Zhijie Liang, Yuxuan Meng, Chuishi Zheng, Yu Zhang, Junbo Liu, Yang |
Author_xml | – sequence: 1 givenname: Yang orcidid: 0000-0002-8428-6039 surname: Liu fullname: Liu, Yang email: liuyang21cn@outlook.com organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China – sequence: 2 givenname: Yingting surname: Liu fullname: Liu, Yingting email: yingting6@outlook.com organization: University of Science and Technology of China, Hefei, Anhui, China – sequence: 3 givenname: Zhijie surname: Liu fullname: Liu, Zhijie email: zhijie_6@163.com organization: Beijing Normal University, Beijing, China – sequence: 4 givenname: Yuxuan surname: Liang fullname: Liang, Yuxuan email: yuxliang@outlook.com organization: School of Computing, National University of Singapore, Singapore – sequence: 5 givenname: Chuishi orcidid: 0000-0002-1995-5291 surname: Meng fullname: Meng, Chuishi email: chuishimeng@gmail.com organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China – sequence: 6 givenname: Junbo orcidid: 0000-0001-5947-1374 surname: Zhang fullname: Zhang, Junbo email: msjunbozhang@outlook.com organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China – sequence: 7 givenname: Yu orcidid: 0000-0002-5224-4344 surname: Zheng fullname: Zheng, Yu email: msyuzheng@outlook.com organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China |
BookMark | eNqFj79PwzAQhS1UJEqpxN4FiTnhfI5_3BgKAaRKLGG2nMSRUpWkOOnAf0-iVAwsTHfD-97Td80Wbdd6xjYcYs6BHvLHpzRPYwSEGIlQS3nBlig0RgikFtMvMNKa4Iqt-34PAFwBCMIlu8185YMbfHWXdcH3ww27rN2h9-vzXbGP7Dnfvka795e3bbqLSiQ9RGVNlIjaqEJjSboAxRUaqZ2WxpTCJUmRSKwqIK8VFZhoIx1OhFBVjShW7H7uPYbu6zQO2313Cu04aVEpoRKhBB9TYk6Voev74Gt7DM2nC9-Wg53s7WxvJ3t7th8p-kOVzeCGpmuH4JrDP-xmZhvv_e8agTGkpPgBHu9mFw |
CODEN | ITBDAX |
CitedBy_id | crossref_primary_10_1016_j_inffus_2023_102198 crossref_primary_10_1145_3523061 crossref_primary_10_1145_3533708 crossref_primary_10_1109_TBDATA_2022_3180117 crossref_primary_10_1109_TNNLS_2021_3072238 crossref_primary_10_1007_s11633_023_1489_6 crossref_primary_10_1016_j_phycom_2021_101347 crossref_primary_10_1109_MIS_2020_3018725 crossref_primary_10_1109_ACCESS_2022_3141709 crossref_primary_10_1145_3560485 crossref_primary_10_1155_2022_6596925 crossref_primary_10_1109_TNET_2022_3187885 crossref_primary_10_1145_3510540 crossref_primary_10_3390_electronics11223814 crossref_primary_10_1007_s12083_021_01256_6 crossref_primary_10_1016_j_ijpe_2023_109095 crossref_primary_10_1007_s11390_023_3009_0 crossref_primary_10_1016_j_future_2024_107672 crossref_primary_10_1109_LES_2022_3207968 crossref_primary_10_1109_TBDATA_2022_3201729 crossref_primary_10_1007_s10722_024_02277_9 crossref_primary_10_1109_JIOT_2021_3095077 crossref_primary_10_1016_j_neucom_2024_127427 crossref_primary_10_1109_TAI_2024_3436664 crossref_primary_10_2478_popets_2021_0043 crossref_primary_10_3390_blockchains2010003 crossref_primary_10_1109_TKDE_2021_3124599 crossref_primary_10_1007_s00779_024_01820_w crossref_primary_10_1109_TP_2024_3392721 crossref_primary_10_3390_diagnostics14222587 crossref_primary_10_14778_3503585_3503598 crossref_primary_10_1049_cit2_12122 crossref_primary_10_1109_ACCESS_2024_3379273 crossref_primary_10_1109_ACCESS_2022_3169502 crossref_primary_10_1109_LGRS_2024_3437743 crossref_primary_10_3390_ani14142021 crossref_primary_10_1109_TCSS_2022_3176656 crossref_primary_10_4018_JGIM_332815 crossref_primary_10_32604_cmes_2023_045417 crossref_primary_10_3390_fi11100220 crossref_primary_10_1109_TIFS_2022_3231784 crossref_primary_10_1016_j_ins_2024_121711 crossref_primary_10_1109_TPDS_2023_3238768 crossref_primary_10_1109_ACCESS_2022_3202008 crossref_primary_10_3390_s25051590 crossref_primary_10_1109_TIFS_2024_3428412 crossref_primary_10_1109_TC_2022_3212631 crossref_primary_10_1109_ACCESS_2024_3440998 crossref_primary_10_1016_j_inffus_2024_102545 crossref_primary_10_1109_TNNLS_2022_3169347 crossref_primary_10_1007_s10661_024_12809_6 crossref_primary_10_3390_app13148019 crossref_primary_10_1016_j_asoc_2024_112475 crossref_primary_10_1109_JIOT_2024_3449910 crossref_primary_10_1145_3588961 crossref_primary_10_3934_mbe_2022044 crossref_primary_10_1007_s10489_024_05589_6 crossref_primary_10_1145_3720539 |
Cites_doi | 10.1109/MIS.2021.3082561 10.1145/3298981 10.1016/j.commatsci.2018.07.052 10.1145/335191.335438 10.1016/j.asoc.2018.10.022 10.1007/bf00058655 10.1109/TDSC.2013.43 10.2196/medinform.8805 10.1007/11535706_11 10.1023/A:I0I0933404324 10.1007/978-3-319-23485-4_53 10.1007/11787006_1 10.1007/978-3-030-04212-7_30 10.1109/TIFS.2017.2787987 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
DOI | 10.1109/TBDATA.2020.2992755 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2372-2096 |
EndPage | 854 |
ExternalDocumentID | 10_1109_TBDATA_2020_2992755 9088965 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Key R&D Program of China grantid: 2019YFB2101805 |
GroupedDBID | 0R~ 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG ACGFS AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS IEDLZ IFIPE IPLJI JAVBF M43 OCL RIA RIE AAYXX CITATION 7SP 8FD L7M |
ID | FETCH-LOGICAL-c297t-cf9943f86b72c97b06162857a7588c3a44b452dd09e769b24785a243f836df223 |
IEDL.DBID | RIE |
ISSN | 2332-7790 |
IngestDate | Mon Jun 30 07:06:42 EDT 2025 Tue Jul 01 03:27:38 EDT 2025 Thu Apr 24 23:03:24 EDT 2025 Wed Aug 27 02:37:56 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c297t-cf9943f86b72c97b06162857a7588c3a44b452dd09e769b24785a243f836df223 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-8428-6039 0000-0001-5947-1374 0000-0002-5224-4344 0000-0002-1995-5291 |
PQID | 2663643631 |
PQPubID | 4437220 |
PageCount | 12 |
ParticipantIDs | proquest_journals_2663643631 crossref_primary_10_1109_TBDATA_2020_2992755 ieee_primary_9088965 crossref_citationtrail_10_1109_TBDATA_2020_2992755 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-06-01 |
PublicationDateYYYYMMDD | 2022-06-01 |
PublicationDate_xml | – month: 06 year: 2022 text: 2022-06-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE transactions on big data |
PublicationTitleAbbrev | TBData |
PublicationYear | 2022 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref12 ref14 Caldas (ref11) 2018 ref31 ref33 ref32 McMahan (ref2) ref17 Regulation (ref1) 2016; 59 Konený (ref4) 2016 Rivest (ref18) 1978 Liu (ref13) 2018 Breiman (ref5) 1984 Huang (ref9) 2018 Geyer (ref19) 2017 ref24 ref23 ref26 ref25 Hardy (ref10) 2017 ref22 ref21 Smith (ref8) Giacomelli (ref28); 2019 McMahan (ref20) ref27 Nock (ref29) 2018 Group (ref16) 2019 Dua (ref30) 2017 Konený (ref3) 2016 Chen (ref7) 2018 ref6 Zhuo (ref15) 2019 |
References_xml | – ident: ref14 doi: 10.1109/MIS.2021.3082561 – volume-title: Proc. Int. Conf. Learn. Representations ident: ref20 article-title: Learning differentially private recurrent language models – year: 2018 ident: ref13 article-title: Secure federated transfer learning – start-page: 4424 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref8 article-title: Federated multi-task learning – ident: ref12 doi: 10.1145/3298981 – year: 2017 ident: ref30 article-title: UCI machine learning repository – volume: 59 start-page: 1 year: 2016 ident: ref1 article-title: Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46 publication-title: Official J. Eur. Union – year: 2017 ident: ref10 article-title: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption – year: 2018 ident: ref29 article-title: Entity resolution and federated learning get a federated resolution – volume: 2019 volume-title: AMIA Summits Translational Sci. Proc. ident: ref28 article-title: Privacy-preserving collaborative prediction using random forests – ident: ref33 doi: 10.1016/j.commatsci.2018.07.052 – ident: ref25 doi: 10.1145/335191.335438 – ident: ref31 doi: 10.1016/j.asoc.2018.10.022 – start-page: 169 volume-title: Foundations of Secure Computation year: 1978 ident: ref18 article-title: On data banks and privacy homomorphisms – ident: ref6 doi: 10.1007/bf00058655 – ident: ref27 doi: 10.1109/TDSC.2013.43 – ident: ref23 doi: 10.2196/medinform.8805 – year: 2018 ident: ref7 article-title: Federated meta-learning with fast convergence and efficient communication – ident: ref26 doi: 10.1007/11535706_11 – year: 2019 ident: ref16 article-title: P3652.1 - guide for architectural framework and application of federated machine learning – ident: ref24 doi: 10.1023/A:I0I0933404324 – ident: ref32 doi: 10.1007/978-3-319-23485-4_53 – ident: ref17 doi: 10.1007/11787006_1 – ident: ref22 doi: 10.1007/978-3-030-04212-7_30 – year: 2016 ident: ref3 article-title: Federated learning: Strategies for improving communication efficiency – start-page: 1273 volume-title: Proc. Int. Conf. Artif. Intell. Statist. ident: ref2 article-title: Communication-efficient learning of deep networks from decentralized data – volume-title: Classification and Regression Trees year: 1984 ident: ref5 – year: 2019 ident: ref15 article-title: Federated deep reinforcement learning – year: 2018 ident: ref9 article-title: Loadaboost: Loss-based adaboost federated machine learning on medical data – year: 2017 ident: ref19 article-title: Differentially private federated learning: A client level perspective – ident: ref21 doi: 10.1109/TIFS.2017.2787987 – year: 2018 ident: ref11 article-title: LEAF: A benchmark for federated settings – year: 2016 ident: ref4 article-title: Federated optimization: Distributed machine learning for on-device intelligence |
SSID | ssj0001600392 |
Score | 2.4919162 |
Snippet | Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 843 |
SubjectTerms | Algorithms Artificial intelligence Companies Cryptography Cybersecurity data mining Data models General Data Protection Regulation Machine learning Privacy Regional development |
Title | Federated Forest |
URI | https://ieeexplore.ieee.org/document/9088965 https://www.proquest.com/docview/2663643631 |
Volume | 8 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED61nVgoUBCFgjIwNmlqO3Y8lkdVIZWplbpF8WsBtYimC78e23EKAoTYIssnWeez78657z6AG0V5mRlJY10SHRMkUMwFGcdUG2UQZTT11AnzJzpbksdVtmrBcI-F0Vr74jOduE__L19t5M49lY18TQ7N2tC2ZlZjtT7fU6iDmaLQWGic8tHi9n6ymNgUEKWJvXQRc3C-L87Hs6n8uIK9X5l2Yd6sqC4neU52lUjk-7dmjf9d8hEchgAzmtQWcQwtvT6BbkPeEIWz3IPu1LWRsJGmihw957Y6heX0YXE3iwM9QiwRZ1UsDecEm5wKhiRnwnpmh4dkpU0BcolLQgTJkFIp14xygQjLsxI5CUztPiB8Bp31Zq3PIeJGKUelbqUYETgV0jCpSjtsrBTK-4AavRUy9A53FBYvhc8hUl7Uyi6csoug7D4M90KvdeuMv6f3nPr2U4Pm-jBoNqgIx2tb2KgC21CK4vHF71KXcIAcTsE_lwygU73t9JWNHipx7c3mA9RZvrw |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMsBCgYIoFMjA2ITUcex4LI-qQNsplbpZ8SMLqEU0Xfj12E5SECDEFlk-yfr8uEfu7gO4UoRlcS6JrzOsfYwE8pnAfZ_oXOWIUBI66oTJlIxm-HEezxvQ29TCaK1d8pkO7Kf7l6-Wcm1DZdcuJ4fEW7Bt9D6Oy2qtz4gKsYWmqGot1A_ZdXpzN0gHxglEYWCeXURtQd8X9eP4VH48wk6zDFswqddUJpQ8B-tCBPL9W7vG_y56H_YqE9MblGfiABp6cQitmr7Bq25zG1pD20jC2JrKswSdq-IIZsP79HbkVwQJvkSMFr7MGcNRnhBBkWRUGN1sKyJpZpyAREYZxgLHSKmQaUqYQJgmcYasRETMTqDoGJqL5UKfgMdypSyZupGiWEShkDmVKjPDuZFCSQdQjRuXVfdwS2Lxwp0XETJegs0t2LwCuwO9jdBr2Tzj7-ltC99maoVcB7r1BvHqgq24sSsiY0yRqH_6u9Ql7IzSyZiPH6ZPZ7CLbNWCC550oVm8rfW5sSUKceGO0AcDSsIJ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Federated+Forest&rft.jtitle=IEEE+transactions+on+big+data&rft.au=Liu%2C+Yang&rft.au=Liu%2C+Yingting&rft.au=Liu%2C+Zhijie&rft.au=Liang%2C+Yuxuan&rft.date=2022-06-01&rft.issn=2332-7790&rft.eissn=2372-2096&rft.volume=8&rft.issue=3&rft.spage=843&rft.epage=854&rft_id=info:doi/10.1109%2FTBDATA.2020.2992755&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TBDATA_2020_2992755 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2332-7790&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2332-7790&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2332-7790&client=summon |