Efficient projection-free online convex optimization using stochastic gradients

We consider Online Convex Optimization (OCO) problems subject to a compact convex set. An important class of projection-free online methods known as Frank–Wolfe-type (FW-type) methods have attracted considerable attention in the machine learning community, as they eschew the expensive projection ope...

Full description

Saved in:

Bibliographic Details
Published in	Machine learning Vol. 114; no. 4; p. 93
Main Authors	Xie, Jiahao, Zhang, Chao, Shen, Zebang, Qian, Hui
Format	Journal Article
Language	English
Published	New York Springer US 01.04.2025 Springer Nature B.V
Subjects	Artificial Intelligence Computer Science Computing costs Control Convex analysis Convexity Decision theory Machine Learning Mechatronics Natural Language Processing (NLP) Optimization Robotics Simulation and Modeling Variance Projection-free methods Online convex optimization Regret bound
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We consider Online Convex Optimization (OCO) problems subject to a compact convex set. An important class of projection-free online methods known as Frank–Wolfe-type (FW-type) methods have attracted considerable attention in the machine learning community, as they eschew the expensive projection operation and only require a simple linear minimization oracle in each round. Recently, the stochastic gradient technique has been integrated in FW-type online methods to circumvent the expensive full gradient computation and further reduce the per-round computational cost. However, these methods generally have high regret bounds due to high variance in gradient estimation. Although adopting a large minibatch in stochastic gradients can reduce the variance, it would in turn increase the per-round computational cost. In this paper, we develop efficient FW-type methods that only need stochastic gradients with small minibatch and achieve nearly optimal regret bounds with low per-round costs. We first explore the similarity between gradients of decision variables in consecutive rounds, and construct a lightweight variance-reduced estimator by utilizing historical gradient information. Based on this estimator, we propose a method named OFWRG for smooth problems in the stochastic setting. We prove that OFWRG achieves a nearly optimal regret bound with the lowest O ( 1 ) per-round computational cost. OFWRG is the first method with such nearly optimal result in this setting. We further extend OFWRG to OCO problems in other settings, including smooth problems in the adversarial setting and a class of non-smooth problems in the stochastic and adversarial settings. Our theoretical analyses show that these extensions of OFWRG achieve nearly optimal regret bounds and low per-round computational costs under mild conditions. Experimental results demonstrate the efficiency of our methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-024-06640-w