Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks....

Full description

Saved in:

Bibliographic Details
Published in	Journal of global optimization Vol. 87; no. 2-4; pp. 709 - 740
Main Authors	Wang, Zhongruo, Balasubramanian, Krishnakumar, Ma, Shiqian, Razaviyayn, Meisam
Format	Journal Article
Language	English
Published	New York Springer US 01.11.2023 Springer
Subjects	Algorithms Comparative analysis Computer Science Machine learning Mathematics Mathematics and Statistics Operations Research/Decision Theory Optimization Real Functions Stochastic algorithms Oracle complexity Gradient descent ascent Minimax problem Zeroth-order algorithms
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the Zeroth-Order Gradient Descent Ascent (ZO-GDA) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the Zeroth-Order Gradient Descent Multi-Step Ascent (ZO-GDMSA) algorithm that significantly improves the oracle complexity of ZO-GDA. We then consider stochastic versions of ZO-GDA and ZO-GDMSA, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parameterized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.
ISSN:	0925-5001 1573-2916
DOI:	10.1007/s10898-022-01160-0