FedMAX: Mitigating Activation Divergence for Accurate and Communication-Efficient Federated Learning
In this paper, we identify a new phenomenon called activation-divergence which occurs in Federated Learning (FL) due to data heterogeneity (i.e., data being non-IID) across multiple users. Specifically, we argue that the activation vectors in FL can diverge, even if subsets of users share a few comm...
Saved in:
Published in | Machine Learning and Knowledge Discovery in Databases pp. 348 - 363 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
2021
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we identify a new phenomenon called activation-divergence which occurs in Federated Learning (FL) due to data heterogeneity (i.e., data being non-IID) across multiple users. Specifically, we argue that the activation vectors in FL can diverge, even if subsets of users share a few common classes with data residing on different devices. To address the activation-divergence issue, we introduce a prior based on the principle of maximum entropy; this prior assumes minimal information about the per-device activation vectors and aims at making the activation vectors of same classes as similar as possible across multiple devices. Our results show that, for both IID and non-IID settings, our proposed approach results in better accuracy (due to the significantly more similar activation vectors across multiple devices), and is more communication-efficient than state-of-the-art approaches in FL. Finally, we illustrate the effectiveness of our approach on a few common benchmarks and two large medical datasets (The code is available at https://github.com/weichennone/FedMAX). |
---|---|
Bibliography: | W. Chen and K. Bhardwaj—Equal Contribution. |
ISBN: | 3030676609 9783030676605 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-030-67661-2_21 |