Balancing Communication and Computation in Distributed Optimization

Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine learning, robotics, and sensor networks. A distributed optimization method typically consists of two key components: communication and compu...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automatic control Vol. 64; no. 8; pp. 3141 - 3155
Main Authors	Berahas, Albert S., Bollapragada, Raghu, Keskar, Nitish Shirish, Wei, Ermin
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Artificial intelligence Communication Computation Convergence Distributed algorithms distributed optimization Domains Iterative methods Linear programming Machine learning Measurement Methods network optimization Optimization optimization algorithms Optimization methods Quadratic equations Robotics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine learning, robotics, and sensor networks. A distributed optimization method typically consists of two key components: communication and computation. More specifically, at every iteration (or every several iterations) of a distributed algorithm, each node in the network requires some form of information exchange with its neighboring nodes (communication) and the computation step related to a (sub)-gradient (computation). The standard way of judging an algorithm via only the number of iterations overlooks the complexity associated with each iteration. Moreover, various applications deploying distributed methods may prefer a different composition of communication and computation. Motivated by this discrepancy, in this paper, we propose an adaptive cost framework that adjusts the cost measure depending on the features of various applications. We present a flexible algorithmic framework, where communication and computation steps are explicitly decomposed to enable algorithm customization for various applications. We apply this framework to the well-known distributed gradient descent (DGD) method, and show that the resulting customized algorithms, which we call DGD<inline-formula><tex-math notation="LaTeX">^t</tex-math></inline-formula>, NEAR-DGD<inline-formula><tex-math notation="LaTeX">^t</tex-math></inline-formula>, and NEAR-DGD<inline-formula><tex-math notation="LaTeX">^+</tex-math></inline-formula>, compare favorably to their base algorithms, both theoretically and empirically. The proposed NEAR-DGD<inline-formula><tex-math notation="LaTeX">^+</tex-math></inline-formula> algorithm is an exact first-order method where the communication and computation steps are nested, and when the number of communication steps is adaptively increased, the method converges to the optimal solution. We test the performance and illustrate the flexibility of the methods, as well as practical variants, on quadratic functions and classification problems that arise in machine learning, in terms of iterations, gradient evaluations, communications, and the proposed cost framework.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2018.2880407