MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised le...

Full description

Saved in:

Bibliographic Details
Main Authors	Zheng, Tianyu, Zhang, Ge, Qu, Xingwei, Kuang, Ming, Huang, Stephen W, He, Zhaofeng
Format	Journal Article
Language	English
Published	20.02.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Computer Science and Game Theory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This contributes to advancing offline RL performance and efficiency while providing a novel perspective on offline RL.Our code and data are available at https://github.com/Zheng0428/MORE_.
DOI:	10.48550/arxiv.2402.12845