Are all objects equal? Deep spatio-temporal importance prediction in driving videos

Understanding intent and relevance of surrounding agents from video is an essential task for many applications in robotics and computer vision. The modeling and evaluation of contextual, spatio-temporal situation awareness is particularly important in the domain of intelligent vehicles, where a robo...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 64; pp. 425 - 436
Main Authors Ohn-Bar, Eshed, Trivedi, Mohan Manubhai
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.04.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Understanding intent and relevance of surrounding agents from video is an essential task for many applications in robotics and computer vision. The modeling and evaluation of contextual, spatio-temporal situation awareness is particularly important in the domain of intelligent vehicles, where a robot is required to smoothly navigate in a complex environment while also interacting with humans. In this paper, we address these issues by studying the task of on-road object importance ranking from video. First, human-centric object importance annotations are employed in order to analyze the relevance of a variety of multi-modal cues for the importance prediction task. A deep convolutional neural network model is used for capturing video-based contextual spatial and temporal cues of scene type, driving task, and object properties related to intent. Second, the proposed importance annotations are used for producing novel analysis of error types in image-based object detectors. Specifically, we demonstrate how cost-sensitive training, informed by the object importance annotations, results in improved detection performance on objects of higher importance. This insight is essential for an application where navigation mistakes are safety-critical, and the quality of automation and human–robot interaction is key. •We study a notion of object relevance, as measured in a spatio-temporal context of driving a vehicle.•Various spatio-temporal object and scene cues are analyzed for the task of object importance classification.•Human-centric metrics are employed for evaluating object detection and studying data bias.•Importance-guided training of object detectors is proposed, showing significant improvement over an importance-agnostic baseline.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2016.08.029