Learning by Watching via Keypoint Extraction and Imitation Learning

In recent years, the use of reinforcement learning and imitation learning to complete robot control tasks have become more popular. Demonstration and learning by experts have always been the goal of researchers. However, the lack of action data has been a significant limitation to learning by human...

Full description

Saved in:

Bibliographic Details
Published in	Machines (Basel) Vol. 10; no. 11; p. 1049
Main Authors	Sun, Yin-Tung Albert, Lin, Hsin-Chang, Wu, Po-Yen, Huang, Jung-Tang
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.11.2022
Subjects	Algorithms Analysis Augmented reality Cameras Cloning Control tasks Households image transition imitation learning keypoint detection Kinematics Learning Mathematical analysis Mathematical optimization Neural networks Optimization Pouring reinforcement learning Robot control Robots Semantics Three dimensional models Tracking
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, the use of reinforcement learning and imitation learning to complete robot control tasks have become more popular. Demonstration and learning by experts have always been the goal of researchers. However, the lack of action data has been a significant limitation to learning by human demonstration. We propose an architecture based on a new 3D keypoint tracking model and generative adversarial imitation learning to learn from expert demonstrations. We used 3D keypoint tracking to make up for the lack of action data in simple images and then used image-to-image conversion to convert human hand demonstrations into robot images, which enabled subsequent generative adversarial imitation learning to learn smoothly. The estimation time of the 3D keypoint tracking model and the calculation time of the subsequent optimization algorithm was 30 ms. The coordinate errors of the model projected to the real 3D key point under correct detection were all within 1.8 cm. The tracking of key points did not require any sensors on the body; the operator did not need vision-related knowledge to correct the accuracy of the camera. By merely setting up a generic depth camera to track the mapping changes of key points after behavior clone training, the robot could learn human tasks by watching, including picking and placing an object and pouring water. We used pybullet to build an experimental environment to confirm our concept of the simplest behavioral cloning imitation to attest the success of the learning. The effectiveness of the proposed method was accomplished by a satisfactory performance requiring a sample efficiency of 20 sets for pick and place and 30 sets for pouring water.
ISSN:	2075-1702 2075-1702
DOI:	10.3390/machines10111049