Automatic Trade-off Adaptation in Offline RL
Recently, offline RL algorithms have been proposed that remain adaptive at runtime. For example, the LION algorithm \cite{lion} provides the user with an interface to set the trade-off between behavior cloning and optimality w.r.t. the estimated return at runtime. Experts can then use this interface...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
16.06.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recently, offline RL algorithms have been proposed that remain adaptive at
runtime. For example, the LION algorithm \cite{lion} provides the user with an
interface to set the trade-off between behavior cloning and optimality w.r.t.
the estimated return at runtime. Experts can then use this interface to adapt
the policy behavior according to their preferences and find a good trade-off
between conservatism and performance optimization. Since expert time is
precious, we extend the methodology with an autopilot that automatically finds
the correct parameterization of the trade-off, yielding a new algorithm which
we term AutoLION. |
---|---|
DOI: | 10.48550/arxiv.2306.09744 |