Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision

In order to intuitively and efficiently collaborate with humans, robots must learn to complete tasks specified using natural language. We represent natural language instructions as goal-state reward functions specified using lambda calculus. Using reward functions as language representations allows...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE International Conference on Robotics and Automation (ICRA) pp. 4430 - 4436
Main Authors	Williams, Edward C., Gopalan, Nakul, Mine Rhee, Tellex, Stefanie
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2018
Subjects	Natural languages Navigation Planning Robots Semantics Task analysis Trajectory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In order to intuitively and efficiently collaborate with humans, robots must learn to complete tasks specified using natural language. We represent natural language instructions as goal-state reward functions specified using lambda calculus. Using reward functions as language representations allows robots to plan efficiently in stochastic environments. To map sentences to such reward functions, we learn a weighted linear Combinatory Categorial Grammar (CCG) semantic parser. The parser, including both parameters and the CCG lexicon, is learned from a validation procedure that does not require execution of a planner, annotating reward functions, or labeling parse trees, unlike prior approaches. To learn a CCG lexicon and parse weights, we use coarse lexical generation and validation-driven perceptron weight updates using the approach of Artzi and Zettlemoyer [4]. We present results on the Cleanup World domain [18] to demonstrate the potential of our approach. We report an F1 score of 0.82 on a collected corpus of 23 tasks containing combinations of nested referential expressions, comparators and object properties with 2037 corresponding sentences. Our goal-condition learning approach enables an improvement of orders of magnitude in computation time over a baseline that performs planning during learning, while achieving comparable results. Further, we conduct an experiment with just 6 labeled demonstrations to show the ease of teaching a robot behaviors using our method. We show that parsing models learned from small data sets can generalize to commands not seen during training.
ISSN:	2577-087X
DOI:	10.1109/ICRA.2018.8460937