Bundled Gradients through Contact via Randomized Smoothing
The empirical success of derivative-free methods in reinforcement learning for planning through contact seems at odds with the perceived fragility of classical gradient-based optimization methods in these domains. What is causing this gap, and how might we use the answer to improve gradient-based me...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The empirical success of derivative-free methods in reinforcement learning
for planning through contact seems at odds with the perceived fragility of
classical gradient-based optimization methods in these domains. What is causing
this gap, and how might we use the answer to improve gradient-based methods? We
believe a stochastic formulation of dynamics is one crucial ingredient. We use
tools from randomized smoothing to analyze sampling-based approximations of the
gradient, and formalize such approximations through the gradient bundle. We
show that using the gradient bundle in lieu of the gradient mitigates
fast-changing gradients of non-smooth contact dynamics modeled by the implicit
time-stepping, or the penalty method. Finally, we apply the gradient bundle to
optimal control using iLQR, introducing a novel algorithm which improves
convergence over using exact gradients. Combining our algorithm with a convex
implicit time-stepping formulation of contact, we show that we can tractably
tackle planning-through-contact problems in manipulation. |
---|---|
DOI: | 10.48550/arxiv.2109.05143 |