CST-RL: Contrastive Spatio-Temporal Representations for Reinforcement Learning

Learning representations from high-dimensional observations is critical for training of pixel-based continuous control tasks with reinforcement learning (RL). Without proper representations, the training will be very inefficient, requiring long training time and huge training data to learn directly...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 11; p. 1
Main Authors Ho, Chi-Kai, King, Chung-Ta
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Learning representations from high-dimensional observations is critical for training of pixel-based continuous control tasks with reinforcement learning (RL). Without proper representations, the training will be very inefficient, requiring long training time and huge training data to learn directly from low-level pixel observations. Yet, a lot of information in such observations may be redundant or irrelevant. A common approach to solving this problem is to train auxiliary objectives alongside the main RL objective. The additional objectives provide more signals to the model and reduce the training time, resulting in better sample efficiency. A representative work is Contrastive Unsupervised Representations for Reinforcement Learning (CURL), which leverages contrastive learning to assist RL to learn useful representations. Although CURL performs very well in extracting spatial information from pixel inputs, it is found to overlook potential temporal signals. In this paper, a contrastive spatio-temporal representation learning framework for RL, called CST-RL, is introduced, which leverages 3D Convolutional Neural Network (3D CNN) alongside contrastive learning for sample-efficient RL. It pays attention to both spatial and temporal signals in pixel observations. Experiments based on DMControl show that CST-RL outperforms CURL in all six environments after 500K environment steps and only needs half of the steps to achieve the standard score in the majority of cases.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3258540