Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion
Deep reinforcement learning (DRL) is one of the most powerful tools for synthesizing complex robotic behaviors. But training DRL models is incredibly compute and memory intensive, requiring large training datasets and replay buffers to achieve performant results. This poses a challenge for the next...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deep reinforcement learning (DRL) is one of the most powerful tools for
synthesizing complex robotic behaviors. But training DRL models is incredibly
compute and memory intensive, requiring large training datasets and replay
buffers to achieve performant results. This poses a challenge for the next
generation of field robots that will need to learn on the edge to adapt to
their environment. In this paper, we begin to address this issue through
observation space quantization. We evaluate our approach using four simulated
robot locomotion tasks and two state-of-the-art DRL algorithms, the on-policy
Proximal Policy Optimization (PPO) and off-policy Soft Actor-Critic (SAC) and
find that observation space quantization reduces overall memory costs by as
much as 4.2x without impacting learning performance. |
---|---|
DOI: | 10.48550/arxiv.2210.08065 |