SPIN: Simultaneous Perception, Interaction and Navigation

While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manip-ulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in un-structured and dynamic...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 18133 - 18142
Main Authors Uppal, Shagun, Agarwal, Ananye, Xiong, Haoyu, Shaw, Kenneth, Pathak, Deepak
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manip-ulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in un-structured and dynamic environments. While the applications are broad and interesting, there are a plethora of chal-lenges in developing these systems such as coordination be-tween the base and arm, reliance on onboard perception for perceiving and interacting with the environment, and most importantly, simultaneously integrating all these parts to-gether. Prior works approach the problem using disentangled modular skills for mobility and manipulation that are trivially tied together. This causes several limitations such as compounding errors, delays in decision-making, and no whole-body coordination. In this work, we present a re-active mobile manipulation framework that uses an active visual system to consciously perceive and react to its en-vironment. Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipu-lator that exploits its ability to move and see, more specifically - to move in order to see and to see in order to move. This allows it to not only move around and interact with its environment but also, choose "when" to perceive "what" using an active visual system. We observe that such an agent learns to navigate around complex cluttered sce-narios while displaying agile whole-body coordination using only ego-vision without needing to create environment maps. Videos are available at https://spin-robot.github.io
ISSN:2575-7075
DOI:10.1109/CVPR52733.2024.01717