On Gradient Coding With Partial Recovery

We consider a generalization of the gradient coding framework where a dataset is divided across <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> workers and each worker transmits to a master node one or more linear combinations of the gradi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on communications Vol. 71; no. 2; pp. 644 - 657
Main Authors	Sarmasarkar, Sahasrajit, Lalitha, V., Karamchandani, Nikhil
Format	Journal Article
Language	English
Published	New York IEEE 01.02.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Codes Coding coding theory Communication Computation Distributed computation Distributed databases Encoding information theory Load modeling Lower bounds Machine learning Polynomials Simulation straggler mitigation Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider a generalization of the gradient coding framework where a dataset is divided across <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> workers and each worker transmits to a master node one or more linear combinations of the gradients over its assigned data subsets. Unlike the conventional framework which requires the master node to recover the sum of the gradients over all the data subsets in the presence of straggler workers, we relax the goal to computing the sum of at least some <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> fraction of the gradients. We begin by deriving a lower bound on the computation load of any scheme and also propose two strategies which achieve this lower bound, albeit at the cost of high communication load and a number of data partitions which can be polynomial in <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>. We then propose schemes based on cyclic assignment which utilize <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> data partitions and have a lower communication load. When each worker transmits a single linear combination, we prove lower bounds on the computation load of any scheme using <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> data partitions. Finally, we describe a class of schemes which achieve different intermediate operating points for the computation and communication load and provide simulation results to demonstrate the empirical performance of our schemes.
ISSN:	0090-6778 1558-0857
DOI:	10.1109/TCOMM.2022.3230779