Efficient projection-free online convex optimization using stochastic gradients
We consider Online Convex Optimization (OCO) problems subject to a compact convex set. An important class of projection-free online methods known as Frank–Wolfe-type (FW-type) methods have attracted considerable attention in the machine learning community, as they eschew the expensive projection ope...
Saved in:
Published in | Machine learning Vol. 114; no. 4; p. 93 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.04.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We consider Online Convex Optimization (OCO) problems subject to a compact convex set. An important class of projection-free online methods known as Frank–Wolfe-type (FW-type) methods have attracted considerable attention in the machine learning community, as they eschew the expensive projection operation and only require a simple linear minimization oracle in each round. Recently, the stochastic gradient technique has been integrated in FW-type online methods to circumvent the expensive full gradient computation and further reduce the per-round computational cost. However, these methods generally have high regret bounds due to high variance in gradient estimation. Although adopting a large minibatch in stochastic gradients can reduce the variance, it would in turn increase the per-round computational cost. In this paper, we develop efficient FW-type methods that only need stochastic gradients with small minibatch and achieve nearly optimal regret bounds with low per-round costs. We first explore the similarity between gradients of decision variables in consecutive rounds, and construct a lightweight variance-reduced estimator by utilizing historical gradient information. Based on this estimator, we propose a method named OFWRG for smooth problems in the stochastic setting. We prove that OFWRG achieves a nearly optimal regret bound with the lowest
O
(
1
)
per-round computational cost. OFWRG is the first method with such nearly optimal result in this setting. We further extend OFWRG to OCO problems in other settings, including smooth problems in the adversarial setting and a class of non-smooth problems in the stochastic and adversarial settings. Our theoretical analyses show that these extensions of OFWRG achieve nearly optimal regret bounds and low per-round computational costs under mild conditions. Experimental results demonstrate the efficiency of our methods. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0885-6125 1573-0565 |
DOI: | 10.1007/s10994-024-06640-w |