The challenge of simultaneous object detection and pose estimation: A comparative study

Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process s...

Full description

Saved in:

Bibliographic Details
Published in	Image and vision computing Vol. 79; pp. 109 - 122
Main Authors	Oñoro-Rubio, Daniel, López-Sastre, Roberto J., Redondo-Cabrera, Carolina, Gil-Jiménez, Pedro
Format	Journal Article
Language	English
Published	Elsevier B.V 01.11.2018
Subjects	Convolutional neural network Deep learning Object detection Pose estimation Viewpoint estimation Deep learning Pose estimation Convolutional neural network Viewpoint estimation Object detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process should be able to generalize towards the category-level. This work is an exploration of using deep learning models for solving both problems simultaneously. For doing so, we propose three novel deep learning architectures, which are able to perform a joint detection and pose estimation, where we gradually decouple the two tasks. We also investigate whether the pose estimation problem should be solved as a classification or regression problem, being this still an open question in the computer vision community. We detail a comparative analysis of all our solutions and the methods that currently define the state of the art for this problem. We use PASCAL3D+ and ObjectNet3D datasets to present the thorough experimental evaluation and main results. With the proposed models we achieve the state-of-the-art performance in both datasets. [Display omitted] •Simultaneous object detection and pose estimation problem is addressed.•Three new deep learning models are evaluated on PASCAL3D+ and ObjectNet3D.•A thorough comparison with state-of-the-art solutions is carried.•The new networks achieve state-of-the-art performance on both datasets•Decoupling the detection from the viewpoint estimation have benefits.
ISSN:	0262-8856 1872-8138
DOI:	10.1016/j.imavis.2018.09.013