Stage-Wise Learning of Reaching Using Little Prior Knowledge

In some manipulation robotics environments, because of the difficulty of precisely modeling dynamics and computing features which describe well the variety of scene appearances, hand-programming a robot behavior is often intractable. Deep reinforcement learning methods partially alleviate this probl...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in robotics and AI Vol. 5; p. 110
Main Authors	de La Bourdonnaye, François, Teulière, Céline, Triesch, Jochen, Chateau, Thierry
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 01.10.2018
Subjects	Computer Science Computer Vision and Pattern Recognition deep reinforcement learning hierarchical learning manipulation robotics Robotics and AI stage-wise learning weakly-supervised deep reinforcement learning weakly-supervised hierarchical learning stage-wise learning manipulation robotics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In some manipulation robotics environments, because of the difficulty of precisely modeling dynamics and computing features which describe well the variety of scene appearances, hand-programming a robot behavior is often intractable. Deep reinforcement learning methods partially alleviate this problem in that they can dispense with hand-crafted features for the state representation and do not need pre-computed dynamics. However, they often use prior information in the task definition in the form of shaping rewards which guide the robot toward goal state areas but require engineering or human supervision and can lead to sub-optimal behavior. In this work we consider a complex robot reaching task with a large range of initial object positions and initial arm positions and propose a new learning approach with minimal supervision. Inspired by developmental robotics, our method consists of a weakly-supervised stage-wise procedure of three tasks. First, the robot learns to fixate the object with a 2-camera system. Second, it learns hand-eye coordination by learning to fixate its end-effector. Third, using the knowledge acquired in the previous steps, it learns to reach the object at different positions and from a large set of initial robot joint angles. Experiments in a simulated environment show that our stage-wise framework yields similar reaching performances, compared with a supervised setting without using kinematic models, hand-crafted features, calibration parameters or supervised visual modules.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Vieri Giuliano Santucci, Istituto di Scienze e Tecnologie Della Cognizione (ISTC), Italy This article was submitted to Computational Intelligence, a section of the journal Frontiers in Robotics and AI Reviewed by: Kathryn Elizabeth Kasmarik, University of New South Wales Canberra, Australia; Carlos Maestre, Université Pierre et Marie Curie, France
ISSN:	2296-9144 2296-9144
DOI:	10.3389/frobt.2018.00110