J-MOD2: Joint Monocular Obstacle Detection and Depth Estimation

In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches rely either on Visual simultaneous localization and mapping (SLAM) systems or on depth estimation models to build t...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 3; no. 3; pp. 1490 - 1497
Main Authors	Mancini, Michele, Costante, Gabriele, Valigi, Paolo, Ciarfuglia, Thomas A.
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.07.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Barriers Cameras Computer architecture Computer simulation Estimation Feature extraction Micro air vehicles (MAV) Obstacle avoidance Range sensing Robustness Scene analysis Semantic segmentation Simultaneous localization and mapping Task analysis Three dimensional models Three-dimensional displays Visual flight visual learning visual-based navigation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches rely either on Visual simultaneous localization and mapping (SLAM) systems or on depth estimation models to build three-dimensional maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multitask at its first ocurrence in the text."?> architectures to perform both scene understanding and depth estimation. We follow their path and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to jointly exploit the information learned from the obstacle detection task, which produces reliable bounding boxes, and the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD 2 . We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multitask methods that perform both semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments, where a micro aerial vehicle explores an unknown scenario and plans safe trajectories by using our detection model.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766
DOI:	10.1109/LRA.2018.2800083