MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised le...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Drawing upon the intuition that aligning different modalities to the same
semantic embedding space would allow models to understand states and actions
more easily, we propose a new perspective to the offline reinforcement learning
(RL) challenge. More concretely, we transform it into a supervised learning
task by integrating multimodal and pre-trained language models. Our approach
incorporates state information derived from images and action-related data
obtained from text, thereby bolstering RL training performance and promoting
long-term strategic thinking. We emphasize the contextual understanding of
language and demonstrate how decision-making in RL can benefit from aligning
states' and actions' representation with languages' representation. Our method
significantly outperforms current baselines as evidenced by evaluations
conducted on Atari and OpenAI Gym environments. This contributes to advancing
offline RL performance and efficiency while providing a novel perspective on
offline RL.Our code and data are available at
https://github.com/Zheng0428/MORE_. |
---|---|
DOI: | 10.48550/arxiv.2402.12845 |