Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning. These problems are usually huge-scale, implying that communication compression is also imperative for their resolution. These factors h...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Rammal, Ahmad, Gruntkowska, Kaja, Fedin, Nikita, Gorbunov, Eduard, Richtárik, Peter
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 09.03.2024
Subjects	Algorithms Communication Convergence Error feedback Machine learning Optimization Parameterization Robustness (mathematics)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning. These problems are usually huge-scale, implying that communication compression is also imperative for their resolution. These factors have spurred recent algorithmic and theoretical developments in the literature of Byzantine-robust learning with compression. In this paper, we contribute to this research area in two main directions. First, we propose a new Byzantine-robust method with compression - Byz-DASHA-PAGE - and prove that the new method has better convergence rate (for non-convex and Polyak-Lojasiewicz smooth optimization problems), smaller neighborhood size in the heterogeneous case, and tolerates more Byzantine workers under over-parametrization than the previous method with SOTA theoretical convergence guarantees (Byz-VR-MARINA). Secondly, we develop the first Byzantine-robust method with communication compression and error feedback - Byz-EF21 - along with its bidirectional compression version - Byz-EF21-BC - and derive the convergence rates for these methods for non-convex and Polyak-Lojasiewicz smooth case. We test the proposed methods and illustrate our theoretical findings in the numerical experiments.
ISSN:	2331-8422