Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication
We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.02.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We consider decentralized stochastic optimization with the objective function
(e.g. data samples for machine learning task) being distributed over $n$
machines that can only communicate to their neighbors on a fixed communication
graph. To reduce the communication bottleneck, the nodes compress (e.g.
quantize or sparsify) their model updates. We cover both unbiased and biased
compression operators with quality denoted by $\omega \leq 1$ ($\omega=1$
meaning no compression). We (i) propose a novel gossip-based stochastic
gradient descent algorithm, CHOCO-SGD, that converges at rate
$\mathcal{O}\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex
objectives, where $T$ denotes the number of iterations and $\delta$ the
eigengap of the connectivity matrix. Despite compression quality and network
connectivity affecting the higher order terms, the first term in the rate,
$\mathcal{O}(1/(nT))$, is the same as for the centralized baseline with exact
communication. We (ii) present a novel gossip algorithm, CHOCO-GOSSIP, for the
average consensus problem that converges in time
$\mathcal{O}(1/(\delta^2\omega) \log (1/\epsilon))$ for accuracy $\epsilon >
0$. This is (up to our knowledge) the first gossip algorithm that supports
arbitrary compressed messages for $\omega > 0$ and still exhibits linear
convergence. We (iii) show in experiments that both of our algorithms do
outperform the respective state-of-the-art baselines and CHOCO-SGD can reduce
communication by at least two orders of magnitudes. |
---|---|
DOI: | 10.48550/arxiv.1902.00340 |