Error Compensated Loopless SVRG, Quartz, and SDCA for Distributed Optimization
The communication of gradients is a key bottleneck in distributed training of large scale machine learning models. In order to reduce the communication cost, gradient compression (e.g., sparsification and quantization) and error compensation techniques are often used. In this paper, we propose and s...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
21.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The communication of gradients is a key bottleneck in distributed training of
large scale machine learning models. In order to reduce the communication cost,
gradient compression (e.g., sparsification and quantization) and error
compensation techniques are often used. In this paper, we propose and study
three new efficient methods in this space: error compensated loopless SVRG
method (EC-LSVRG), error compensated Quartz (EC-Quartz), and error compensated
SDCA (EC-SDCA). Our method is capable of working with any contraction
compressor (e.g., TopK compressor), and we perform analysis for convex
optimization problems in the composite case and smooth case for EC-LSVRG. We
prove linear convergence rates for both cases and show that in the smooth case
the rate has a better dependence on the parameter associated with the
contraction compressor. Further, we show that in the smooth case, and under
some certain conditions, error compensated loopless SVRG has the same
convergence rate as the vanilla loopless SVRG method. Then we show that the
convergence rates of EC-Quartz and EC-SDCA in the composite case are as good as
EC-LSVRG in the smooth case. Finally, numerical experiments are presented to
illustrate the efficiency of our methods. |
---|---|
DOI: | 10.48550/arxiv.2109.10049 |