End-to-End Learning of Driving Models from Large-Scale Video Datasets

Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate l...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3530 - 3538
Main Authors Huazhe Xu, Yang Gao, Yu, Fisher, Darrell, Trevor
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. We provide a novel large-scale dataset of crowd-sourced driving behavior suitable for training our model, and report results predicting the driver action on held out sequences across diverse conditions.
ISSN:1063-6919
DOI:10.1109/CVPR.2017.376