An Improved Analysis of Gradient Tracking for Decentralized Machine Learning
35th Conference on Neural Information Processing Systems (NeurIPS 2021) We consider decentralized machine learning over a network where the training data is distributed across $n$ agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
08.02.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | 35th Conference on Neural Information Processing Systems (NeurIPS
2021) We consider decentralized machine learning over a network where the training
data is distributed across $n$ agents, each of which can compute stochastic
model updates on their local data. The agent's common goal is to find a model
that minimizes the average of all local loss functions. While gradient tracking
(GT) algorithms can overcome a key challenge, namely accounting for differences
between workers' local data distributions, the known convergence rates for GT
algorithms are not optimal with respect to their dependence on the mixing
parameter $p$ (related to the spectral gap of the connectivity matrix).
We provide a tighter analysis of the GT method in the stochastic strongly
convex, convex and non-convex settings. We improve the dependency on $p$ from
$\mathcal{O}(p^{-2})$ to $\mathcal{O}(p^{-1}c^{-1})$ in the noiseless case and
from $\mathcal{O}(p^{-3/2})$ to $\mathcal{O}(p^{-1/2}c^{-1})$ in the general
stochastic case, where $c \geq p$ is related to the negative eigenvalues of the
connectivity matrix (and is a constant in most practical applications). This
improvement was possible due to a new proof technique which could be of
independent interest. |
---|---|
DOI: | 10.48550/arxiv.2202.03836 |