An accelerated variance reducing stochastic method with Douglas-Rachford splitting

We consider the problem of minimizing the regularized empirical risk function which is represented as the average of a large number of convex loss functions plus a possibly non-smooth convex regularization term. In this paper, we propose a fast variance reducing (VR) stochastic method called Prox2-S...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 108; no. 5; pp. 859 - 878
Main Authors	Liu, Jingchang, Xu, Linli, Shen, Shuheng, Ling, Qing
Format	Journal Article
Language	English
Published	New York Springer US 01.05.2019 Springer Nature B.V
Subjects	Artificial Intelligence Computer Science Conditioning Control Convergence Economic forecasting Mapping Mechatronics Natural Language Processing (NLP) Regularization Robotics Simulation and Modeling Special Issue of the ACML 2018 Journal Track Splitting Proximal operator Acceleration Douglas-Rachford splitting Gradient mapping Variance reduction (VR)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider the problem of minimizing the regularized empirical risk function which is represented as the average of a large number of convex loss functions plus a possibly non-smooth convex regularization term. In this paper, we propose a fast variance reducing (VR) stochastic method called Prox2-SAGA. Different from traditional VR stochastic methods, Prox2-SAGA replaces the stochastic gradient of the loss function with the corresponding gradient mapping. In addition, Prox2-SAGA also computes the gradient mapping of the regularization term. These two gradient mappings constitute a Douglas-Rachford splitting step. For strongly convex and smooth loss functions, we prove that Prox2-SAGA can achieve a linear convergence rate comparable to other accelerated VR stochastic methods. In addition, Prox2-SAGA is more practical as it involves only the stepsize to tune. When each loss function is smooth but non-strongly convex, we prove a convergence rate of O ( 1 / k ) for the proposed Prox2-SAGA method, where k is the number of iterations. Moreover, experiments show that Prox2-SAGA is valid for non-smooth loss functions, and for strongly convex and smooth loss functions, Prox2-SAGA is prominently faster when loss functions are ill-conditioned.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-019-05785-3