RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection
3D point cloud understanding has made great progress in recent years. However, one major bottleneck is the scarcity of annotated real datasets, especially compared to 2D object detection tasks, since a large amount of labor is involved in annotating the real scans of a scene. A promising solution to...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
17.08.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | 3D point cloud understanding has made great progress in recent years.
However, one major bottleneck is the scarcity of annotated real datasets,
especially compared to 2D object detection tasks, since a large amount of labor
is involved in annotating the real scans of a scene. A promising solution to
this problem is to make better use of the synthetic dataset, which consists of
CAD object models, to boost the learning on real datasets. This can be achieved
by the pre-training and fine-tuning procedure. However, recent work on 3D
pre-training exhibits failure when transfer features learned on synthetic
objects to other real-world applications. In this work, we put forward a new
method called RandomRooms to accomplish this objective. In particular, we
propose to generate random layouts of a scene by making use of the objects in
the synthetic CAD dataset and learn the 3D scene representation by applying
object-level contrastive learning on two random scenes generated from the same
set of synthetic objects. The model pre-trained in this way can serve as a
better initialization when later fine-tuning on the 3D object detection task.
Empirically, we show consistent improvement in downstream 3D detection tasks on
several base models, especially when less training data are used, which
strongly demonstrates the effectiveness and generalization of our method.
Benefiting from the rich semantic knowledge and diverse objects from synthetic
data, our method establishes the new state-of-the-art on widely-used 3D
detection benchmarks ScanNetV2 and SUN RGB-D. We expect our attempt to provide
a new perspective for bridging object and scene-level 3D understanding. |
---|---|
DOI: | 10.48550/arxiv.2108.07794 |