Offline Trajectory Generalization for Offline Reinforcement Learning
Offline reinforcement learning (RL) aims to learn policies from static datasets of previously collected trajectories. Existing methods for offline RL either constrain the learned policy to the support of offline data or utilize model-based virtual environments to generate simulated rollouts. However...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
16.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Offline reinforcement learning (RL) aims to learn policies from static
datasets of previously collected trajectories. Existing methods for offline RL
either constrain the learned policy to the support of offline data or utilize
model-based virtual environments to generate simulated rollouts. However, these
methods suffer from (i) poor generalization to unseen states; and (ii) trivial
improvement from low-qualified rollout simulation. In this paper, we propose
offline trajectory generalization through world transformers for offline
reinforcement learning (OTTO). Specifically, we use casual Transformers, a.k.a.
World Transformers, to predict state dynamics and the immediate reward. Then we
propose four strategies to use World Transformers to generate high-rewarded
trajectory simulation by perturbing the offline data. Finally, we jointly use
offline data with simulated data to train an offline RL algorithm. OTTO serves
as a plug-in module and can be integrated with existing offline RL methods to
enhance them with better generalization capability of transformers and
high-rewarded data augmentation. Conducting extensive experiments on D4RL
benchmark datasets, we verify that OTTO significantly outperforms
state-of-the-art offline RL methods. |
---|---|
DOI: | 10.48550/arxiv.2404.10393 |