On Stability and Generalization of Bilevel Optimization Problem
(Stochastic) bilevel optimization is a frequently encountered problem in machine learning with a wide range of applications such as meta-learning, hyper-parameter optimization, and reinforcement learning. Most of the existing studies on this problem only focused on analyzing the convergence or impro...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | (Stochastic) bilevel optimization is a frequently encountered problem in
machine learning with a wide range of applications such as meta-learning,
hyper-parameter optimization, and reinforcement learning. Most of the existing
studies on this problem only focused on analyzing the convergence or improving
the convergence rate, while little effort has been devoted to understanding its
generalization behaviors. In this paper, we conduct a thorough analysis on the
generalization of first-order (gradient-based) methods for the bilevel
optimization problem. We first establish a fundamental connection between
algorithmic stability and generalization error in different forms and give a
high probability generalization bound which improves the previous best one from
$\bigO(\sqrt{n})$ to $\bigO(\log n)$, where $n$ is the sample size. We then
provide the first stability bounds for the general case where both inner and
outer level parameters are subject to continuous update, while existing work
allows only the outer level parameter to be updated. Our analysis can be
applied in various standard settings such as strongly-convex-strongly-convex
(SC-SC), convex-convex (C-C), and nonconvex-nonconvex (NC-NC). Our analysis for
the NC-NC setting can also be extended to a particular
nonconvex-strongly-convex (NC-SC) setting that is commonly encountered in
practice. Finally, we corroborate our theoretical analysis and demonstrate how
iterations can affect the generalization error by experiments on meta-learning
and hyper-parameter optimization. |
---|---|
DOI: | 10.48550/arxiv.2210.01063 |