J-MOD2: Joint Monocular Obstacle Detection and Depth Estimation

In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches rely either on Visual simultaneous localization and mapping (SLAM) systems or on depth estimation models to build t...

Full description

Saved in:
Bibliographic Details
Published inIEEE robotics and automation letters Vol. 3; no. 3; pp. 1490 - 1497
Main Authors Mancini, Michele, Costante, Gabriele, Valigi, Paolo, Ciarfuglia, Thomas A.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.07.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2377-3766
DOI10.1109/LRA.2018.2800083

Cover

Loading…
Abstract In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches rely either on Visual simultaneous localization and mapping (SLAM) systems or on depth estimation models to build three-dimensional maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multitask at its first ocurrence in the text."?> architectures to perform both scene understanding and depth estimation. We follow their path and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to jointly exploit the information learned from the obstacle detection task, which produces reliable bounding boxes, and the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD 2 . We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multitask methods that perform both semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments, where a micro aerial vehicle explores an unknown scenario and plans safe trajectories by using our detection model.
AbstractList In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches rely either on Visual simultaneous localization and mapping (SLAM) systems or on depth estimation models to build three-dimensional maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multitask at its first ocurrence in the text."?> architectures to perform both scene understanding and depth estimation. We follow their path and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to jointly exploit the information learned from the obstacle detection task, which produces reliable bounding boxes, and the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD 2 . We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multitask methods that perform both semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments, where a micro aerial vehicle explores an unknown scenario and plans safe trajectories by using our detection model.
In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches rely either on Visual simultaneous localization and mapping (SLAM) systems or on depth estimation models to build three-dimensional maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multitask at its first ocurrence in the text."?> architectures to perform both scene understanding and depth estimation. We follow their path and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to jointly exploit the information learned from the obstacle detection task, which produces reliable bounding boxes, and the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD2. We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multitask methods that perform both semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments, where a micro aerial vehicle explores an unknown scenario and plans safe trajectories by using our detection model.
Author Costante, Gabriele
Mancini, Michele
Ciarfuglia, Thomas A.
Valigi, Paolo
Author_xml – sequence: 1
  givenname: Michele
  orcidid: 0000-0003-0845-2813
  surname: Mancini
  fullname: Mancini, Michele
  email: michele.mancini@unipg.it
  organization: Department of Engineering, University of Perugia, Perugia, Italy
– sequence: 2
  givenname: Gabriele
  orcidid: 0000-0002-8417-9372
  surname: Costante
  fullname: Costante, Gabriele
  email: gabriele.costante@unipg.it
  organization: Department of Engineering, University of Perugia, Perugia, Italy
– sequence: 3
  givenname: Paolo
  orcidid: 0000-0002-0486-7678
  surname: Valigi
  fullname: Valigi, Paolo
  email: paolo.valigi@unipg.it
  organization: Department of Engineering, University of Perugia, Perugia, Italy
– sequence: 4
  givenname: Thomas A.
  orcidid: 0000-0001-8646-8197
  surname: Ciarfuglia
  fullname: Ciarfuglia, Thomas A.
  email: thomas.ciarfuglia@unipg.it
  organization: Department of Engineering, University of Perugia, Perugia, Italy
BookMark eNotjs9LwzAcxYMoOOfugpeA5858k-aXFxnb_DE6CqLnkmYpdtSktunB_97IPD3e48N77wqd--AdQjdAlgBE3xdvqyUloJZUEUIUO0MzyqTMmBTiEi3G8Zhi4FQyzWfocZftyw19wLvQ-oj3wQc7dWbAZT1GYzuHNy46G9vgsfGH5Pr4ibdjbL_MX3iNLhrTjW7xr3P08bR9X79kRfn8ul4VWQtax4waYcBJsJwyI8WBCKNzapkwEoDmNZGNYE7IWgtirXG50hI4z0E3vG64YHN0d-rth_A9uTFWxzANPk1WlGrNUg1Aom5PVOucq_ohnRx-KkWl4IqwXxwoUiU
CODEN IRALC6
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LRA.2018.2800083
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2377-3766
EndPage 1497
ExternalDocumentID 8276580
Genre orig-research
GrantInformation_xml – fundername: NVIDIA
  funderid: 10.13039/100007065
GroupedDBID 0R~
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFS
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
RIA
RIE
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
RIG
ID FETCH-LOGICAL-i199t-2a6a1e71c523a76d06a942c36a71124b07f63e67b960ccae4897155419f5bf563
IEDL.DBID RIE
IngestDate Sun Jun 29 15:44:22 EDT 2025
Wed Aug 27 08:32:58 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i199t-2a6a1e71c523a76d06a942c36a71124b07f63e67b960ccae4897155419f5bf563
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-0486-7678
0000-0002-8417-9372
0000-0003-0845-2813
0000-0001-8646-8197
PQID 2299371111
PQPubID 4437225
PageCount 8
ParticipantIDs ieee_primary_8276580
proquest_journals_2299371111
PublicationCentury 2000
PublicationDate 2018-07-01
PublicationDateYYYYMMDD 2018-07-01
PublicationDate_xml – month: 07
  year: 2018
  text: 2018-07-01
  day: 01
PublicationDecade 2010
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE robotics and automation letters
PublicationTitleAbbrev LRA
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
SSID ssj0001527395
Score 2.2813122
Snippet In this letter, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 1490
SubjectTerms Barriers
Cameras
Computer architecture
Computer simulation
Estimation
Feature extraction
Micro air vehicles (MAV)
Obstacle avoidance
Range sensing
Robustness
Scene analysis
Semantic segmentation
Simultaneous localization and mapping
Task analysis
Three dimensional models
Three-dimensional displays
Visual flight
visual learning
visual-based navigation
Title J-MOD2: Joint Monocular Obstacle Detection and Depth Estimation
URI https://ieeexplore.ieee.org/document/8276580
https://www.proquest.com/docview/2299371111
Volume 3
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTjDwKohCqTww4jR2Ej9YUEVbVRWhEqJSt8iOHVEhpRWkCwO_HTtJAQEDmz1Ysn0-33fn784AXAYh01z5HGWRVCgMGEZKa4lERCIieYZT3yU4x_d0Mg-ni2jRAFefuTDGmJJ8ZjzXLN_y9SrduFBZnxNmDaZ10JvWcatytb7iKa6SmIi2L5G-6N89DBx1i3uEl0ij_j_l16VbWpLxPoi3c6gIJM_eplBe-vajPON_J3kA9mpICQfVGTgEDZMfgd1vhQbb4GaK4tmQXMPpapkX0CryquSfwpmy6NCOgkNTlKSsHMpc2966eIIjq_5VZuMxmI9Hj7cTVH-dgJZYiAIRSSU2DKfWz5SMap9KEZI0oJJZgBUqn2U0MJQp68BYGZqQC-aQBRZZpLKIBiegla9ycwogxgqn2CilrbUPtOBpmsnAcWm4VDzjHdB2m5Csq-oYSb3-Duhutzmp1eI1IcTBIXdLn_096hzsOJFVMY4uaBUvG3NhrX6heqAZv496pdA_AFMAq7I
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8IHtSDX2hEUXvwaMfabf3wYohAEBkkBhJuS7t1kZgMouPiX2-7DTXqwVt7aPLa1773e-3vvQJw7fks4crlKA2kQr7HMFJJIpEISEAkT3Hs2gTncEwHM384D-Y1cPOZC6O1Lshn2rHN4i0_WcZre1XW5oQZh2kC9C3j9wNcZmt93ajYWmIi2LxFuqI9eupY8hZ3CC-wRvWDyi-zW_iS_j4IN1KUFJIXZ50rJ37_UaDxv2IegL0KVMJOuQsOQU1nR2D3W6nBBrgbonDSJbdwuFxkOTRHeVkwUOFEGXxoRsGuzgtaVgZllpjeKn-GPWMAytzGYzDr96b3A1R9noAWWIgcEUkl1gzHJtKUjCYulcInsUclMxDLVy5LqacpUyaEMVrUPhfMYgss0kClAfVOQD1bZvoUQIwVjrFWKjH-3ksEj-NUepZNw6XiKW-Chl2EaFXWx4iq-TdBa7PMUXUw3iJCLCCydvrs71FXYHswDUfR6GH8eA52rPpKdmwL1PPXtb4wGCBXl4XqPwBoSK3Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=J-MOD2%3A+Joint+Monocular+Obstacle+Detection+and+Depth+Estimation&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Mancini%2C+Michele&rft.au=Costante%2C+Gabriele&rft.au=Valigi%2C+Paolo&rft.au=Ciarfuglia%2C+Thomas+A&rft.date=2018-07-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.eissn=2377-3766&rft.volume=3&rft.issue=3&rft.spage=1490&rft_id=info:doi/10.1109%2FLRA.2018.2800083&rft.externalDBID=NO_FULL_TEXT