Zeroth-order algorithms for nonconvex–strongly-concave minimax problems with improved complexities

In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks....

Full description

Saved in:
Bibliographic Details
Published inJournal of global optimization Vol. 87; no. 2-4; pp. 709 - 740
Main Authors Wang, Zhongruo, Balasubramanian, Krishnakumar, Ma, Shiqian, Razaviyayn, Meisam
Format Journal Article
LanguageEnglish
Published New York Springer US 01.11.2023
Springer
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we study zeroth-order algorithms for minimax optimization problems that are nonconvex in one variable and strongly-concave in the other variable. Such minimax optimization problems have attracted significant attention lately due to their applications in modern machine learning tasks. We first consider a deterministic version of the problem. We design and analyze the Zeroth-Order Gradient Descent Ascent (ZO-GDA) algorithm, and provide improved results compared to existing works, in terms of oracle complexity. We also propose the Zeroth-Order Gradient Descent Multi-Step Ascent (ZO-GDMSA) algorithm that significantly improves the oracle complexity of ZO-GDA. We then consider stochastic versions of ZO-GDA and ZO-GDMSA, to handle stochastic nonconvex minimax problems. For this case, we provide oracle complexity results under two assumptions on the stochastic gradient: (i) the uniformly bounded variance assumption, which is common in traditional stochastic optimization, and (ii) the Strong Growth Condition (SGC), which has been known to be satisfied by modern over-parameterized machine learning models. We establish that under the SGC assumption, the complexities of the stochastic algorithms match that of deterministic algorithms. Numerical experiments are presented to support our theoretical results.
ISSN:0925-5001
1573-2916
DOI:10.1007/s10898-022-01160-0