The challenge of simultaneous object detection and pose estimation: A comparative study
Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process s...
Saved in:
Published in | Image and vision computing Vol. 79; pp. 109 - 122 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.11.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process should be able to generalize towards the category-level. This work is an exploration of using deep learning models for solving both problems simultaneously. For doing so, we propose three novel deep learning architectures, which are able to perform a joint detection and pose estimation, where we gradually decouple the two tasks. We also investigate whether the pose estimation problem should be solved as a classification or regression problem, being this still an open question in the computer vision community. We detail a comparative analysis of all our solutions and the methods that currently define the state of the art for this problem. We use PASCAL3D+ and ObjectNet3D datasets to present the thorough experimental evaluation and main results. With the proposed models we achieve the state-of-the-art performance in both datasets.
[Display omitted]
•Simultaneous object detection and pose estimation problem is addressed.•Three new deep learning models are evaluated on PASCAL3D+ and ObjectNet3D.•A thorough comparison with state-of-the-art solutions is carried.•The new networks achieve state-of-the-art performance on both datasets•Decoupling the detection from the viewpoint estimation have benefits. |
---|---|
ISSN: | 0262-8856 1872-8138 |
DOI: | 10.1016/j.imavis.2018.09.013 |