Efficient projection-free online convex optimization using stochastic gradients

We consider Online Convex Optimization (OCO) problems subject to a compact convex set. An important class of projection-free online methods known as Frank–Wolfe-type (FW-type) methods have attracted considerable attention in the machine learning community, as they eschew the expensive projection ope...

Full description

Saved in:
Bibliographic Details
Published inMachine learning Vol. 114; no. 4; p. 93
Main Authors Xie, Jiahao, Zhang, Chao, Shen, Zebang, Qian, Hui
Format Journal Article
LanguageEnglish
Published New York Springer US 01.04.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We consider Online Convex Optimization (OCO) problems subject to a compact convex set. An important class of projection-free online methods known as Frank–Wolfe-type (FW-type) methods have attracted considerable attention in the machine learning community, as they eschew the expensive projection operation and only require a simple linear minimization oracle in each round. Recently, the stochastic gradient technique has been integrated in FW-type online methods to circumvent the expensive full gradient computation and further reduce the per-round computational cost. However, these methods generally have high regret bounds due to high variance in gradient estimation. Although adopting a large minibatch in stochastic gradients can reduce the variance, it would in turn increase the per-round computational cost. In this paper, we develop efficient FW-type methods that only need stochastic gradients with small minibatch and achieve nearly optimal regret bounds with low per-round costs. We first explore the similarity between gradients of decision variables in consecutive rounds, and construct a lightweight variance-reduced estimator by utilizing historical gradient information. Based on this estimator, we propose a method named OFWRG for smooth problems in the stochastic setting. We prove that OFWRG achieves a nearly optimal regret bound with the lowest O ( 1 ) per-round computational cost. OFWRG is the first method with such nearly optimal result in this setting. We further extend OFWRG to OCO problems in other settings, including smooth problems in the adversarial setting and a class of non-smooth problems in the stochastic and adversarial settings. Our theoretical analyses show that these extensions of OFWRG achieve nearly optimal regret bounds and low per-round computational costs under mild conditions. Experimental results demonstrate the efficiency of our methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-024-06640-w