Joint Human Detection and Head Pose Estimation via Multistream Networks for RGB-D Videos
We propose a multistream multitask deep network for joint human detection and head pose estimation in RGB-D videos. To achieve high accuracy, we jointly utilize appearance, shape, and motion information as inputs. Based on the depth information, we generate scale invariant proposals, which are then...
Saved in:
Published in | IEEE signal processing letters Vol. 24; no. 11; pp. 1666 - 1670 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
01.11.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We propose a multistream multitask deep network for joint human detection and head pose estimation in RGB-D videos. To achieve high accuracy, we jointly utilize appearance, shape, and motion information as inputs. Based on the depth information, we generate scale invariant proposals, which are then fed into a novel contextual region of interest pooling (CRP) layer in our deep network. This CRP has two branches to deal with contextual information for each subject. The proposed method outperforms state-of-the-art approaches on three public datasets. |
---|---|
ISSN: | 1070-9908 1558-2361 |
DOI: | 10.1109/LSP.2017.2731952 |