Joint Human Detection and Head Pose Estimation via Multistream Networks for RGB-D Videos

We propose a multistream multitask deep network for joint human detection and head pose estimation in RGB-D videos. To achieve high accuracy, we jointly utilize appearance, shape, and motion information as inputs. Based on the depth information, we generate scale invariant proposals, which are then...

Full description

Saved in:
Bibliographic Details
Published inIEEE signal processing letters Vol. 24; no. 11; pp. 1666 - 1670
Main Authors Guyue Zhang, Jun Liu, Hengduo Li, Yan Qiu Chen, Davis, Larry S.
Format Journal Article
LanguageEnglish
Published IEEE 01.11.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose a multistream multitask deep network for joint human detection and head pose estimation in RGB-D videos. To achieve high accuracy, we jointly utilize appearance, shape, and motion information as inputs. Based on the depth information, we generate scale invariant proposals, which are then fed into a novel contextual region of interest pooling (CRP) layer in our deep network. This CRP has two branches to deal with contextual information for each subject. The proposed method outperforms state-of-the-art approaches on three public datasets.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2017.2731952