The challenge of simultaneous object detection and pose estimation: A comparative study

Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process s...

Full description

Saved in:
Bibliographic Details
Published inImage and vision computing Vol. 79; pp. 109 - 122
Main Authors Oñoro-Rubio, Daniel, López-Sastre, Roberto J., Redondo-Cabrera, Carolina, Gil-Jiménez, Pedro
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.11.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally needs to be view-invariant, while the pose estimation process should be able to generalize towards the category-level. This work is an exploration of using deep learning models for solving both problems simultaneously. For doing so, we propose three novel deep learning architectures, which are able to perform a joint detection and pose estimation, where we gradually decouple the two tasks. We also investigate whether the pose estimation problem should be solved as a classification or regression problem, being this still an open question in the computer vision community. We detail a comparative analysis of all our solutions and the methods that currently define the state of the art for this problem. We use PASCAL3D+ and ObjectNet3D datasets to present the thorough experimental evaluation and main results. With the proposed models we achieve the state-of-the-art performance in both datasets. [Display omitted] •Simultaneous object detection and pose estimation problem is addressed.•Three new deep learning models are evaluated on PASCAL3D+ and ObjectNet3D.•A thorough comparison with state-of-the-art solutions is carried.•The new networks achieve state-of-the-art performance on both datasets•Decoupling the detection from the viewpoint estimation have benefits.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2018.09.013