Automatic Trade-off Adaptation in Offline RL

Recently, offline RL algorithms have been proposed that remain adaptive at runtime. For example, the LION algorithm \cite{lion} provides the user with an interface to set the trade-off between behavior cloning and optimality w.r.t. the estimated return at runtime. Experts can then use this interface...

Full description

Saved in:

Bibliographic Details
Main Authors	Swazinna, Phillip, Udluft, Steffen, Runkler, Thomas
Format	Journal Article
Language	English
Published	16.06.2023
Subjects	Computer Science - Learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently, offline RL algorithms have been proposed that remain adaptive at runtime. For example, the LION algorithm \cite{lion} provides the user with an interface to set the trade-off between behavior cloning and optimality w.r.t. the estimated return at runtime. Experts can then use this interface to adapt the policy behavior according to their preferences and find a good trade-off between conservatism and performance optimization. Since expert time is precious, we extend the methodology with an autopilot that automatically finds the correct parameterization of the trade-off, yielding a new algorithm which we term AutoLION.
DOI:	10.48550/arxiv.2306.09744