Robust and Scalable Federated Learning Framework for Client Data Heterogeneity Based on Optimal Clustering

•Enhancing the differences in weights between models of different quality.•Improving the balance of the client-side sampling process.•Eliminating the need for pre-specifying the number of clusters.•Seamlessly incorporating new clients.•The Non-IID issue is solved while minimizing the impact of low-q...

Full description

Saved in:

Bibliographic Details
Published in	Journal of parallel and distributed computing Vol. 195; p. 104990
Main Authors	Li, Zihan, Yuan, Shuai, Guan, Zhitao
Format	Journal Article
Language	English
Published	Elsevier Inc 01.01.2025
Subjects	Clustered federated learning Heterogeneous data Low-quality samples Robustness Scalability Clustered federated learning Robustness Heterogeneous data Scalability Low-quality samples
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Enhancing the differences in weights between models of different quality.•Improving the balance of the client-side sampling process.•Eliminating the need for pre-specifying the number of clusters.•Seamlessly incorporating new clients.•The Non-IID issue is solved while minimizing the impact of low-quality data. Federated learning is a promising paradigm for applications across a variety of domains. However, there are some challenges that must be addressed in real-world scenarios, particularly the data heterogeneity among participating clients. Most existing studies primarily focus on the issue of non-independent and identically distributed data, but they do not consider the critical aspect of data quality heterogeneity. When low-quality data is contributed by some clients, the efficacy of models trained through the traditional approaches will be significantly compromised. Therefore, we propose ROSCFL, a robust and scalable federated learning framework for client data heterogeneity based on optimal clustering. We first develop a cluster contribution evaluation strategy based on the optimal clustering to quantify the contribution of each cluster. Next, we design a robust model aggregation strategy, which effectively mitigates the impact of low-quality data on the global model by optimizing weight allocation and client sampling. Finally, we introduce a client incorporation mechanism to enhance the scalability of ROSCFL. Extensive experiments have been conducted, and the results demonstrate that ROSCFL achieves strong robustness and scalability, particularly in scenarios wherein data distribution and quality heterogeneity coexist.
ISSN:	0743-7315
DOI:	10.1016/j.jpdc.2024.104990