Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data
Policy gradient (PG) methods are the backbone of many reinforcement learning algorithms due to their good performance in policy optimization problems. As a gradient-based approach, PG methods typically rely on knowledge of the system dynamics. However, this information is not always available, and i...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.02.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2502.19977 |
Cover
Loading…
Summary: | Policy gradient (PG) methods are the backbone of many reinforcement learning
algorithms due to their good performance in policy optimization problems. As a
gradient-based approach, PG methods typically rely on knowledge of the system
dynamics. However, this information is not always available, and in such cases,
trajectory data can be utilized to approximate first-order information. When
the data are noisy, gradient estimates become inaccurate and a formal
investigation that encompasses uncertainty estimation and the analysis of its
propagation through the algorithm is currently missing. To address this, our
work focuses on the Linear Quadratic Regulator (LQR) problem for systems
subject to additive stochastic noise. After briefly summarizing the state of
the art for cases with a known model, we focus on scenarios where the system
dynamics are unknown, and approximate gradient information is obtained using
zeroth-order optimization techniques. We analyze the theoretical properties by
computing the error in the estimated gradient and examining how this error
affects the convergence of PG algorithms. Additionally, we provide global
convergence guarantees for various versions of PG methods, including those
employing adaptive step sizes and variance reduction techniques, which help
increase the convergence rate and reduce sample complexity. One contribution of
this work is the study of the robustness of model-free PG methods, aiming to
identify their limitations in the presence of noise and propose improvements to
enhance their applicability. Numerical simulations show that these theoretical
analyses provide valuable guidance in tuning the algorithm parameters, thereby
making these methods more reliable in practically relevant scenarios. |
---|---|
DOI: | 10.48550/arxiv.2502.19977 |