End-to-End Learning of Driving Models from Large-Scale Video Datasets

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate l...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3530 - 3538
Main Authors	Huazhe Xu, Yang Gao, Yu, Fisher, Darrell, Trevor
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2017
Subjects	Computer architecture Data models Motion segmentation Predictive models Training Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. We provide a novel large-scale dataset of crowd-sourced driving behavior suitable for training our model, and report results predicting the driver action on held out sequences across diverse conditions.
ISSN:	1063-6919
DOI:	10.1109/CVPR.2017.376