Federated Forest

Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situ...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on big data Vol. 8; no. 3; pp. 843 - 854
Main Authors Liu, Yang, Liu, Yingting, Liu, Zhijie, Liang, Yuxuan, Meng, Chuishi, Zhang, Junbo, Zheng, Yu
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.06.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2332-7790
2372-2096
DOI10.1109/TBDATA.2020.2992755

Cover

Abstract Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this article, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest , which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions' clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as of the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks.
AbstractList Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union’s General Data Protection Regulation (GDPR) and China’ Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this article, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest , which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions’ clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as of the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks.
Author Liu, Yingting
Liu, Zhijie
Liang, Yuxuan
Meng, Chuishi
Zheng, Yu
Zhang, Junbo
Liu, Yang
Author_xml – sequence: 1
  givenname: Yang
  orcidid: 0000-0002-8428-6039
  surname: Liu
  fullname: Liu, Yang
  email: liuyang21cn@outlook.com
  organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China
– sequence: 2
  givenname: Yingting
  surname: Liu
  fullname: Liu, Yingting
  email: yingting6@outlook.com
  organization: University of Science and Technology of China, Hefei, Anhui, China
– sequence: 3
  givenname: Zhijie
  surname: Liu
  fullname: Liu, Zhijie
  email: zhijie_6@163.com
  organization: Beijing Normal University, Beijing, China
– sequence: 4
  givenname: Yuxuan
  surname: Liang
  fullname: Liang, Yuxuan
  email: yuxliang@outlook.com
  organization: School of Computing, National University of Singapore, Singapore
– sequence: 5
  givenname: Chuishi
  orcidid: 0000-0002-1995-5291
  surname: Meng
  fullname: Meng, Chuishi
  email: chuishimeng@gmail.com
  organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China
– sequence: 6
  givenname: Junbo
  orcidid: 0000-0001-5947-1374
  surname: Zhang
  fullname: Zhang, Junbo
  email: msjunbozhang@outlook.com
  organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China
– sequence: 7
  givenname: Yu
  orcidid: 0000-0002-5224-4344
  surname: Zheng
  fullname: Zheng, Yu
  email: msyuzheng@outlook.com
  organization: JD Digits, JD Intelligent Cities Business Unit, Beijing, China
BookMark eNqFj79PwzAQhS1UJEqpxN4FiTnhfI5_3BgKAaRKLGG2nMSRUpWkOOnAf0-iVAwsTHfD-97Td80Wbdd6xjYcYs6BHvLHpzRPYwSEGIlQS3nBlig0RgikFtMvMNKa4Iqt-34PAFwBCMIlu8185YMbfHWXdcH3ww27rN2h9-vzXbGP7Dnfvka795e3bbqLSiQ9RGVNlIjaqEJjSboAxRUaqZ2WxpTCJUmRSKwqIK8VFZhoIx1OhFBVjShW7H7uPYbu6zQO2313Cu04aVEpoRKhBB9TYk6Voev74Gt7DM2nC9-Wg53s7WxvJ3t7th8p-kOVzeCGpmuH4JrDP-xmZhvv_e8agTGkpPgBHu9mFw
CODEN ITBDAX
CitedBy_id crossref_primary_10_1016_j_inffus_2023_102198
crossref_primary_10_1145_3523061
crossref_primary_10_1145_3533708
crossref_primary_10_1109_TBDATA_2022_3180117
crossref_primary_10_1109_TNNLS_2021_3072238
crossref_primary_10_1007_s11633_023_1489_6
crossref_primary_10_1016_j_phycom_2021_101347
crossref_primary_10_1109_MIS_2020_3018725
crossref_primary_10_1109_ACCESS_2022_3141709
crossref_primary_10_1145_3560485
crossref_primary_10_1155_2022_6596925
crossref_primary_10_1109_TNET_2022_3187885
crossref_primary_10_1145_3510540
crossref_primary_10_3390_electronics11223814
crossref_primary_10_1007_s12083_021_01256_6
crossref_primary_10_1016_j_ijpe_2023_109095
crossref_primary_10_1007_s11390_023_3009_0
crossref_primary_10_1016_j_future_2024_107672
crossref_primary_10_1109_LES_2022_3207968
crossref_primary_10_1109_TBDATA_2022_3201729
crossref_primary_10_1007_s10722_024_02277_9
crossref_primary_10_1109_JIOT_2021_3095077
crossref_primary_10_1016_j_neucom_2024_127427
crossref_primary_10_1109_TAI_2024_3436664
crossref_primary_10_2478_popets_2021_0043
crossref_primary_10_3390_blockchains2010003
crossref_primary_10_1109_TKDE_2021_3124599
crossref_primary_10_1007_s00779_024_01820_w
crossref_primary_10_1109_TP_2024_3392721
crossref_primary_10_3390_diagnostics14222587
crossref_primary_10_14778_3503585_3503598
crossref_primary_10_1049_cit2_12122
crossref_primary_10_1109_ACCESS_2024_3379273
crossref_primary_10_1109_ACCESS_2022_3169502
crossref_primary_10_1109_LGRS_2024_3437743
crossref_primary_10_3390_ani14142021
crossref_primary_10_1109_TCSS_2022_3176656
crossref_primary_10_4018_JGIM_332815
crossref_primary_10_32604_cmes_2023_045417
crossref_primary_10_3390_fi11100220
crossref_primary_10_1109_TIFS_2022_3231784
crossref_primary_10_1016_j_ins_2024_121711
crossref_primary_10_1109_TPDS_2023_3238768
crossref_primary_10_1109_ACCESS_2022_3202008
crossref_primary_10_3390_s25051590
crossref_primary_10_1109_TIFS_2024_3428412
crossref_primary_10_1109_TC_2022_3212631
crossref_primary_10_1109_ACCESS_2024_3440998
crossref_primary_10_1016_j_inffus_2024_102545
crossref_primary_10_1109_TNNLS_2022_3169347
crossref_primary_10_1007_s10661_024_12809_6
crossref_primary_10_3390_app13148019
crossref_primary_10_1016_j_asoc_2024_112475
crossref_primary_10_1109_JIOT_2024_3449910
crossref_primary_10_1145_3588961
crossref_primary_10_3934_mbe_2022044
crossref_primary_10_1007_s10489_024_05589_6
crossref_primary_10_1145_3720539
Cites_doi 10.1109/MIS.2021.3082561
10.1145/3298981
10.1016/j.commatsci.2018.07.052
10.1145/335191.335438
10.1016/j.asoc.2018.10.022
10.1007/bf00058655
10.1109/TDSC.2013.43
10.2196/medinform.8805
10.1007/11535706_11
10.1023/A:I0I0933404324
10.1007/978-3-319-23485-4_53
10.1007/11787006_1
10.1007/978-3-030-04212-7_30
10.1109/TIFS.2017.2787987
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TBDATA.2020.2992755
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2372-2096
EndPage 854
ExternalDocumentID 10_1109_TBDATA_2020_2992755
9088965
Genre orig-research
GrantInformation_xml – fundername: National Key R&D Program of China
  grantid: 2019YFB2101805
GroupedDBID 0R~
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
ACGFS
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
IEDLZ
IFIPE
IPLJI
JAVBF
M43
OCL
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c297t-cf9943f86b72c97b06162857a7588c3a44b452dd09e769b24785a243f836df223
IEDL.DBID RIE
ISSN 2332-7790
IngestDate Mon Jun 30 07:06:42 EDT 2025
Tue Jul 01 03:27:38 EDT 2025
Thu Apr 24 23:03:24 EDT 2025
Wed Aug 27 02:37:56 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c297t-cf9943f86b72c97b06162857a7588c3a44b452dd09e769b24785a243f836df223
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8428-6039
0000-0001-5947-1374
0000-0002-5224-4344
0000-0002-1995-5291
PQID 2663643631
PQPubID 4437220
PageCount 12
ParticipantIDs proquest_journals_2663643631
crossref_primary_10_1109_TBDATA_2020_2992755
ieee_primary_9088965
crossref_citationtrail_10_1109_TBDATA_2020_2992755
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-06-01
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-06-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on big data
PublicationTitleAbbrev TBData
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref12
ref14
Caldas (ref11) 2018
ref31
ref33
ref32
McMahan (ref2)
ref17
Regulation (ref1) 2016; 59
Konený (ref4) 2016
Rivest (ref18) 1978
Liu (ref13) 2018
Breiman (ref5) 1984
Huang (ref9) 2018
Geyer (ref19) 2017
ref24
ref23
ref26
ref25
Hardy (ref10) 2017
ref22
ref21
Smith (ref8)
Giacomelli (ref28); 2019
McMahan (ref20)
ref27
Nock (ref29) 2018
Group (ref16) 2019
Dua (ref30) 2017
Konený (ref3) 2016
Chen (ref7) 2018
ref6
Zhuo (ref15) 2019
References_xml – ident: ref14
  doi: 10.1109/MIS.2021.3082561
– volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref20
  article-title: Learning differentially private recurrent language models
– year: 2018
  ident: ref13
  article-title: Secure federated transfer learning
– start-page: 4424
  volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst.
  ident: ref8
  article-title: Federated multi-task learning
– ident: ref12
  doi: 10.1145/3298981
– year: 2017
  ident: ref30
  article-title: UCI machine learning repository
– volume: 59
  start-page: 1
  year: 2016
  ident: ref1
  article-title: Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46
  publication-title: Official J. Eur. Union
– year: 2017
  ident: ref10
  article-title: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption
– year: 2018
  ident: ref29
  article-title: Entity resolution and federated learning get a federated resolution
– volume: 2019
  volume-title: AMIA Summits Translational Sci. Proc.
  ident: ref28
  article-title: Privacy-preserving collaborative prediction using random forests
– ident: ref33
  doi: 10.1016/j.commatsci.2018.07.052
– ident: ref25
  doi: 10.1145/335191.335438
– ident: ref31
  doi: 10.1016/j.asoc.2018.10.022
– start-page: 169
  volume-title: Foundations of Secure Computation
  year: 1978
  ident: ref18
  article-title: On data banks and privacy homomorphisms
– ident: ref6
  doi: 10.1007/bf00058655
– ident: ref27
  doi: 10.1109/TDSC.2013.43
– ident: ref23
  doi: 10.2196/medinform.8805
– year: 2018
  ident: ref7
  article-title: Federated meta-learning with fast convergence and efficient communication
– ident: ref26
  doi: 10.1007/11535706_11
– year: 2019
  ident: ref16
  article-title: P3652.1 - guide for architectural framework and application of federated machine learning
– ident: ref24
  doi: 10.1023/A:I0I0933404324
– ident: ref32
  doi: 10.1007/978-3-319-23485-4_53
– ident: ref17
  doi: 10.1007/11787006_1
– ident: ref22
  doi: 10.1007/978-3-030-04212-7_30
– year: 2016
  ident: ref3
  article-title: Federated learning: Strategies for improving communication efficiency
– start-page: 1273
  volume-title: Proc. Int. Conf. Artif. Intell. Statist.
  ident: ref2
  article-title: Communication-efficient learning of deep networks from decentralized data
– volume-title: Classification and Regression Trees
  year: 1984
  ident: ref5
– year: 2019
  ident: ref15
  article-title: Federated deep reinforcement learning
– year: 2018
  ident: ref9
  article-title: Loadaboost: Loss-based adaboost federated machine learning on medical data
– year: 2017
  ident: ref19
  article-title: Differentially private federated learning: A client level perspective
– ident: ref21
  doi: 10.1109/TIFS.2017.2787987
– year: 2018
  ident: ref11
  article-title: LEAF: A benchmark for federated settings
– year: 2016
  ident: ref4
  article-title: Federated optimization: Distributed machine learning for on-device intelligence
SSID ssj0001600392
Score 2.4919162
Snippet Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 843
SubjectTerms Algorithms
Artificial intelligence
Companies
Cryptography
Cybersecurity
data mining
Data models
General Data Protection Regulation
Machine learning
Privacy
Regional development
Title Federated Forest
URI https://ieeexplore.ieee.org/document/9088965
https://www.proquest.com/docview/2663643631
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED61nVgoUBCFgjIwNmlqO3Y8lkdVIZWplbpF8WsBtYimC78e23EKAoTYIssnWeez78657z6AG0V5mRlJY10SHRMkUMwFGcdUG2UQZTT11AnzJzpbksdVtmrBcI-F0Vr74jOduE__L19t5M49lY18TQ7N2tC2ZlZjtT7fU6iDmaLQWGic8tHi9n6ymNgUEKWJvXQRc3C-L87Hs6n8uIK9X5l2Yd6sqC4neU52lUjk-7dmjf9d8hEchgAzmtQWcQwtvT6BbkPeEIWz3IPu1LWRsJGmihw957Y6heX0YXE3iwM9QiwRZ1UsDecEm5wKhiRnwnpmh4dkpU0BcolLQgTJkFIp14xygQjLsxI5CUztPiB8Bp31Zq3PIeJGKUelbqUYETgV0jCpSjtsrBTK-4AavRUy9A53FBYvhc8hUl7Uyi6csoug7D4M90KvdeuMv6f3nPr2U4Pm-jBoNqgIx2tb2KgC21CK4vHF71KXcIAcTsE_lwygU73t9JWNHipx7c3mA9RZvrw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMsBCgYIoFMjA2ITUcex4LI-qQNsplbpZ8SMLqEU0Xfj12E5SECDEFlk-yfr8uEfu7gO4UoRlcS6JrzOsfYwE8pnAfZ_oXOWIUBI66oTJlIxm-HEezxvQ29TCaK1d8pkO7Kf7l6-Wcm1DZdcuJ4fEW7Bt9D6Oy2qtz4gKsYWmqGot1A_ZdXpzN0gHxglEYWCeXURtQd8X9eP4VH48wk6zDFswqddUJpQ8B-tCBPL9W7vG_y56H_YqE9MblGfiABp6cQitmr7Bq25zG1pD20jC2JrKswSdq-IIZsP79HbkVwQJvkSMFr7MGcNRnhBBkWRUGN1sKyJpZpyAREYZxgLHSKmQaUqYQJgmcYasRETMTqDoGJqL5UKfgMdypSyZupGiWEShkDmVKjPDuZFCSQdQjRuXVfdwS2Lxwp0XETJegs0t2LwCuwO9jdBr2Tzj7-ltC99maoVcB7r1BvHqgq24sSsiY0yRqH_6u9Ql7IzSyZiPH6ZPZ7CLbNWCC550oVm8rfW5sSUKceGO0AcDSsIJ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Federated+Forest&rft.jtitle=IEEE+transactions+on+big+data&rft.au=Liu%2C+Yang&rft.au=Liu%2C+Yingting&rft.au=Liu%2C+Zhijie&rft.au=Liang%2C+Yuxuan&rft.date=2022-06-01&rft.issn=2332-7790&rft.eissn=2372-2096&rft.volume=8&rft.issue=3&rft.spage=843&rft.epage=854&rft_id=info:doi/10.1109%2FTBDATA.2020.2992755&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TBDATA_2020_2992755
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2332-7790&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2332-7790&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2332-7790&client=summon