Randomized Distributed Mean Estimation: Accuracy vs Communication
We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. Th...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
22.11.2016
|
Subjects | |
Online Access | Get full text |
ISSN | 2331-8422 |
Cover
Loading…
Summary: | We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algorithms for distributed and federated optimization and learning. We propose a flexible family of randomized algorithms exploring the trade-off between expected communication cost and estimation error. Our family contains the full-communication and zero-error method on one extreme, and an \(\epsilon\)-bit communication and \({\cal O}\left(1/(\epsilon n)\right)\) error method on the opposite extreme. In the special case where we communicate, in expectation, a single bit per coordinate of each vector, we improve upon existing results by obtaining \(\mathcal{O}(r/n)\) error, where \(r\) is the number of bits used to represent a floating point value. |
---|---|
Bibliography: | content type line 50 SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 |
ISSN: | 2331-8422 |