PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds
Perceiving the environment via cameras is crucial for Reinforcement Learning (RL) in robotics. While images are a convenient form of representation, they often complicate extracting important geometric details, especially with varying geometries or deformable objects. In contrast, point clouds natur...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Perceiving the environment via cameras is crucial for Reinforcement Learning
(RL) in robotics. While images are a convenient form of representation, they
often complicate extracting important geometric details, especially with
varying geometries or deformable objects. In contrast, point clouds naturally
represent this geometry and easily integrate color and positional data from
multiple camera views. However, while deep learning on point clouds has seen
many recent successes, RL on point clouds is under-researched, with only the
simplest encoder architecture considered in the literature. We introduce
PointPatchRL (PPRL), a method for RL on point clouds that builds on the common
paradigm of dividing point clouds into overlapping patches, tokenizing them,
and processing the tokens with transformers. PPRL provides significant
improvements compared with other point-cloud processing architectures
previously used for RL. We then complement PPRL with masked reconstruction for
representation learning and show that our method outperforms strong model-free
and model-based baselines on image observations in complex manipulation tasks
containing deformable objects and variations in target object geometry. Videos
and code are available at https://alrhub.github.io/pprl-website |
---|---|
DOI: | 10.48550/arxiv.2410.18800 |