Joint Human Detection and Head Pose Estimation via Multistream Networks for RGB-D Videos

We propose a multistream multitask deep network for joint human detection and head pose estimation in RGB-D videos. To achieve high accuracy, we jointly utilize appearance, shape, and motion information as inputs. Based on the depth information, we generate scale invariant proposals, which are then...

Full description

Saved in:

Bibliographic Details
Published in	IEEE signal processing letters Vol. 24; no. 11; pp. 1666 - 1670
Main Authors	Guyue Zhang, Jun Liu, Hengduo Li, Yan Qiu Chen, Davis, Larry S.
Format	Journal Article
Language	English
Published	IEEE 01.11.2017
Subjects	Contextual ROI pooling (CRP) Head head pose estimation human detection Machine learning Pose estimation Proposals RGB-D image scale invariant proposals Shape Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose a multistream multitask deep network for joint human detection and head pose estimation in RGB-D videos. To achieve high accuracy, we jointly utilize appearance, shape, and motion information as inputs. Based on the depth information, we generate scale invariant proposals, which are then fed into a novel contextual region of interest pooling (CRP) layer in our deep network. This CRP has two branches to deal with contextual information for each subject. The proposed method outperforms state-of-the-art approaches on three public datasets.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2017.2731952