Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Natural language-based communication with mobile devices and home appliances is becoming increasingly popular and has the potential to become natural for communicating with mobile robots in the future. Towards this goal, we investigate cross-modal text-to-point-cloud localization that will allow us...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
28.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Natural language-based communication with mobile devices and home appliances
is becoming increasingly popular and has the potential to become natural for
communicating with mobile robots in the future. Towards this goal, we
investigate cross-modal text-to-point-cloud localization that will allow us to
specify, for example, a vehicle pick-up or goods delivery location. In
particular, we propose Text2Pos, a cross-modal localization module that learns
to align textual descriptions with localization cues in a coarse- to-fine
manner. Given a point cloud of the environment, Text2Pos locates a position
that is specified via a natural language-based description of the immediate
surroundings. To train Text2Pos and study its performance, we construct
KITTI360Pose, the first dataset for this task based on the recently introduced
KITTI360 dataset. Our experiments show that we can localize 65% of textual
queries within 15m distance to query locations for top-10 retrieved locations.
This is a starting point that we hope will spark future developments towards
language-based navigation. |
---|---|
DOI: | 10.48550/arxiv.2203.15125 |