Distributed (ATC) Gradient Descent for High Dimension Sparse Regression

We study linear regression from data distributed over a network of agents (with no master node) by means of LASSO estimation, in high-dimension , which allows the ambient dimension to grow faster than the sample size. While there is a vast literature of distributed algorithms applicable to the probl...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information theory Vol. 69; no. 8; p. 1
Main Authors	Ji, Yao, Scutari, Gesualdo, Sun, Ying, Honnappa, Harsha
Format	Journal Article
Language	English
Published	New York IEEE 01.08.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Convergence Convexity Distributed optimization high-dimension statistics linear convergence Linear regression Mesh networks Optimization Probability Smoothness sparse linear regression Standard data Statistical analysis Tuning
Online Access	Get full text
ISSN	0018-9448 1557-9654
DOI	10.1109/TIT.2023.3267742

Cover

More Information
Summary:	We study linear regression from data distributed over a network of agents (with no master node) by means of LASSO estimation, in high-dimension , which allows the ambient dimension to grow faster than the sample size. While there is a vast literature of distributed algorithms applicable to the problem, statistical and computational guarantees of most of them remain unclear in high dimension. This paper provides a first statistical study of the Distributed Gradient Descent (DGD) in the Adapt-Then-Combine (ATC) form. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions-which hold with high probability for standard data generation models-suitable conditions on the network connectivity and algorithm tuning, DGD-ATC converges globally at a linear rate to an estimate that is within the centralized statistical precision of the model. In the worst-case scenario, the total number of communications to statistical optimality grows logarithmically with the ambient dimension, which improves on the communication complexity of DGD in the Combine-Then-Adapt (CTA) form, scaling linearly with the dimension. This reveals that mixing gradient information among agents, as DGD-ATC does, is critical in high-dimensions to obtain favorable rate scalings.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9448 1557-9654
DOI:	10.1109/TIT.2023.3267742