Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects

We demonstrate model-based, visual robot manipulation of deformable linear objects. Our approach is based on a state-space representation of the physical system that the robot aims to control. This choice has multiple advantages, including the ease of incorporating physics priors in the dynamics mod...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 5; no. 2; pp. 2371 - 2378
Main Authors	Yan, Mengyuan, Zhu, Yilin, Jin, Ning, Bohg, Jeannette
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.04.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Annotations Computer simulation deep learning in robotics and automation Deformation Formability Mass-spring systems Neural networks perception for grasping and manipulation Physics Predictive control Robot control Robots Rope Self-supervised learning State estimation State space models Supervised learning Training Visual learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We demonstrate model-based, visual robot manipulation of deformable linear objects. Our approach is based on a state-space representation of the physical system that the robot aims to control. This choice has multiple advantages, including the ease of incorporating physics priors in the dynamics model and perception model, and the ease of planning manipulation actions. In addition, physical states can naturally represent object instances of different appearances. Therefore, dynamics in the state space can be learned in one setting and directly used in other visually different settings. This is in contrast to dynamics learned in pixel space or latent space, where generalization to visual differences are not guaranteed. Challenges in taking the state-space approach are the estimation of the high-dimensional state of a deformable object from raw images, where annotations are very expensive on real data, and finding a dynamics model that is both accurate, generalizable, and efficient to compute. We are the first to demonstrate self-supervised training of rope state estimation on real images, without requiring expensive annotations. This is achieved by our novel self-supervising learning objective, which is generalizable across a wide range of visual appearances. With estimated rope states, we train a fast and differentiable neural network dynamics model that encodes the physics of mass-spring systems. Our method has a higher accuracy in predicting future states compared to models that do not involve explicit state estimation and do not use any physics prior, while only using 3% of training data. We also show that our approach achieves more efficient manipulation, both in simulation and on a real robot, when used within a model predictive controller.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2020.2969931