Randomized Distributed Mean Estimation: Accuracy vs Communication

We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. Th...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Konečný, Jakub, Richtárik, Peter
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 22.11.2016
Subjects	Algorithms Communication Errors Floating point arithmetic Machine learning Optimization Randomization
Online Access	Get full text
ISSN	2331-8422

Cover

Loading…

More Information
Summary:	We consider the problem of estimating the arithmetic average of a finite collection of real vectors stored in a distributed fashion across several compute nodes subject to a communication budget constraint. Our analysis does not rely on any statistical assumptions about the source of the vectors. This problem arises as a subproblem in many applications, including reduce-all operations within algorithms for distributed and federated optimization and learning. We propose a flexible family of randomized algorithms exploring the trade-off between expected communication cost and estimation error. Our family contains the full-communication and zero-error method on one extreme, and an \(\epsilon\)-bit communication and \({\cal O}\left(1/(\epsilon n)\right)\) error method on the opposite extreme. In the special case where we communicate, in expectation, a single bit per coordinate of each vector, we improve upon existing results by obtaining \(\mathcal{O}(r/n)\) error, where \(r\) is the number of bits used to represent a floating point value.
Bibliography:	content type line 50 SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1
ISSN:	2331-8422