Toward Domain Independence for Learning-Based Monocular Depth Estimation

Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space, and power consumption. These re...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 2; no. 3; pp. 1778 - 1785
Main Authors	Mancini, Michele, Costante, Gabriele, Valigi, Paolo, Ciarfuglia, Thomas A., Delmerico, Jeffrey, Scaramuzza, Davide
Format	Journal Article
Language	English
Published	IEEE 01.07.2017
Subjects	Benchmark testing Cameras Collision avoidance Estimation Feature extraction range sensing Streaming media Training Vehicles visual-based navigation
Online Access	Get full text

Cover

Loading…

Abstract	Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space, and power consumption. These represent highly desirable features, especially for microaerial vehicles. In order to guarantee robust operation in real-world scenarios, the estimator is required to generalize well in diverse environments. Most of the existent depth estimators do not consider generalization, and only benchmark their performance on publicly available datasets after specific fine tuning. Generalization can be achieved by training on several heterogeneous datasets, but their collection and labeling is costly. In this letter, we propose a deep neural network for scene depth estimation that is trained on synthetic datasets, which allow inexpensive generation of ground truth data. We show how this approach is able to generalize well across different scenarios. In addition, we show how the addition of long short-term memory layers in the network helps to alleviate, in sequential image streams, some of the intrinsic limitations of monocular vision, such as global scale estimation, with low computational overhead. We demonstrate that the network is able to generalize well with respect to different real-world environments without any fine tuning, achieving comparable performance to state-of-the-art methods on the KITTI dataset.
AbstractList	Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space, and power consumption. These represent highly desirable features, especially for microaerial vehicles. In order to guarantee robust operation in real-world scenarios, the estimator is required to generalize well in diverse environments. Most of the existent depth estimators do not consider generalization, and only benchmark their performance on publicly available datasets after specific fine tuning. Generalization can be achieved by training on several heterogeneous datasets, but their collection and labeling is costly. In this letter, we propose a deep neural network for scene depth estimation that is trained on synthetic datasets, which allow inexpensive generation of ground truth data. We show how this approach is able to generalize well across different scenarios. In addition, we show how the addition of long short-term memory layers in the network helps to alleviate, in sequential image streams, some of the intrinsic limitations of monocular vision, such as global scale estimation, with low computational overhead. We demonstrate that the network is able to generalize well with respect to different real-world environments without any fine tuning, achieving comparable performance to state-of-the-art methods on the KITTI dataset.
Author	Costante, Gabriele Mancini, Michele Ciarfuglia, Thomas A. Valigi, Paolo Delmerico, Jeffrey Scaramuzza, Davide
Author_xml	– sequence: 1 givenname: Michele surname: Mancini fullname: Mancini, Michele email: michele.mancini@unipg.it organization: Dept. of Eng., Univ. of Perugia, Perugia, Italy – sequence: 2 givenname: Gabriele surname: Costante fullname: Costante, Gabriele email: gabriele.costante@unipg.it organization: Dept. of Eng., Univ. of Perugia, Perugia, Italy – sequence: 3 givenname: Paolo surname: Valigi fullname: Valigi, Paolo email: paolo.valigi@unipg.it organization: Dept. of Eng., Univ. of Perugia, Perugia, Italy – sequence: 4 givenname: Thomas A. surname: Ciarfuglia fullname: Ciarfuglia, Thomas A. email: thomas.ciarfuglia@unipg.it organization: Dept. of Eng., Univ. of Perugia, Perugia, Italy – sequence: 5 givenname: Jeffrey surname: Delmerico fullname: Delmerico, Jeffrey email: jeffdelmerico@ifi.uzh.ch organization: Robot. & Perception Group, Univ. of Zurich, Zurich, Switzerland – sequence: 6 givenname: Davide surname: Scaramuzza fullname: Scaramuzza, Davide email: sdavide@ifi.uzh.ch organization: Robot. & Perception Group, Univ. of Zurich, Zurich, Switzerland
BookMark	eNp9kE1LAzEQQINUsNbeBS_5A7vmo5vsHmtbbWFFkHpeYjLRyDZZkhXx37u1RcSDl5lhmDfMvHM08sEDQpeU5JSS6rp-nOeMUJkzUUhC2AkaMy5lxqUQo1_1GZqm9EYIoQWTvCrGaL0NHyoavAw75TzeeAMdDMFrwDZEXIOK3vmX7EYlMPg--KDfWxXxErr-Fa9S73aqd8FfoFOr2gTTY56gp9vVdrHO6oe7zWJeZ5qTos-EeZ5pDkXFpRVGqcrI0laWilIYsFBS0JpzXcLwx9Axs4IZI6llYAwXFPgEicNeHUNKEWyjXf99QR-VaxtKmr2SZlDS7JU0RyUDSP6AXRxuj5__IVcHxAHAz7gsWcWk4F91XW7p
CODEN	IRALC6
CitedBy_id	crossref_primary_10_1117_1_JEI_34_2_020901 crossref_primary_10_3390_rs12030588 crossref_primary_10_1109_TII_2022_3189428 crossref_primary_10_1109_ACCESS_2024_3432181 crossref_primary_10_1007_s11227_023_05359_0 crossref_primary_10_1016_j_neucom_2020_12_089 crossref_primary_10_1109_LRA_2018_2800083 crossref_primary_10_1109_LRA_2018_2795643 crossref_primary_10_3390_s24237752 crossref_primary_10_1109_LRA_2020_3047796 crossref_primary_10_1145_3672397 crossref_primary_10_1145_3677327 crossref_primary_10_1016_j_jmsy_2022_01_012 crossref_primary_10_1080_01431161_2019_1681601 crossref_primary_10_1177_0142331220947507 crossref_primary_10_1016_j_robot_2020_103701 crossref_primary_10_3390_electronics9060924 crossref_primary_10_1109_ACCESS_2022_3204876 crossref_primary_10_1016_j_ast_2018_11_027 crossref_primary_10_1016_j_paerosci_2020_100617 crossref_primary_10_1080_01691864_2019_1586760 crossref_primary_10_1016_j_jag_2024_103753 crossref_primary_10_1109_ACCESS_2018_2846769 crossref_primary_10_1109_TCI_2020_2981761 crossref_primary_10_1177_0361198120954438 crossref_primary_10_1109_ACCESS_2021_3064758 crossref_primary_10_1016_j_cviu_2022_103489 crossref_primary_10_1364_OL_478375 crossref_primary_10_1016_j_rineng_2022_100636 crossref_primary_10_1109_LRA_2020_3010753 crossref_primary_10_1109_TITS_2022_3219604 crossref_primary_10_1007_s11042_023_15757_4 crossref_primary_10_17341_gazimmfd_979121 crossref_primary_10_3390_s22145353 crossref_primary_10_1109_TITS_2021_3071428
Cites_doi	10.1109/IROS.2015.7353537 10.1007/978-3-319-28872-7_37 10.1023/A:1014573219977 10.1109/ICRA.2015.7138979 10.1109/ICCV.2015.304 10.1109/CVPR.2016.148 10.1109/TPAMI.2007.1166 10.1109/IROS.2015.7353448 10.1109/CVPR.2016.594 10.1109/IROS.2016.7759632 10.1109/ICASSP.2015.7178838 10.1162/neco.1997.9.8.1735 10.1007/978-3-319-46484-8_45 10.1109/ICCV.2011.6126513 10.1109/ICRA.2014.6907233 10.1177/0278364913491297 10.1109/TPAMI.2008.132 10.1109/TPAMI.2015.2505283
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/LRA.2017.2657002
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2377-3766
EndPage	1785
ExternalDocumentID	10_1109_LRA_2017_2657002 7829276
Genre	orig-research
GrantInformation_xml	– fundername: M.I.U.R. (Minstero dell’Istruzione dell’Università e della Ricerca) grantid: SCN_398/SEAL – fundername: DARPA FLA Program
GroupedDBID	0R~ 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF KQ8 M43 M~E O9- OCL RIA RIE AAYXX CITATION RIG
ID	FETCH-LOGICAL-c305t-6db4c3e5937f6daa9d78f9f1686defe81ecc33c8e5706ded452dd71f2edd361e3
IEDL.DBID	RIE
ISSN	2377-3766
IngestDate	Tue Jul 01 03:53:52 EDT 2025 Thu Apr 24 22:53:59 EDT 2025 Tue Aug 26 16:57:02 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c305t-6db4c3e5937f6daa9d78f9f1686defe81ecc33c8e5706ded452dd71f2edd361e3
OpenAccessLink	https://www.zora.uzh.ch/id/eprint/138918/1/RAL17_Mancini.pdf
PageCount	8
ParticipantIDs	crossref_citationtrail_10_1109_LRA_2017_2657002 crossref_primary_10_1109_LRA_2017_2657002 ieee_primary_7829276
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2017-July 2017-7-00
PublicationDateYYYYMMDD	2017-07-01
PublicationDate_xml	– month: 07 year: 2017 text: 2017-July
PublicationDecade	2010
PublicationTitle	IEEE robotics and automation letters
PublicationTitleAbbrev	LRA
PublicationYear	2017
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref13 ref12 ref14 ref31 deng (ref22) 0 ref10 simonyan (ref21) 2014 ref1 ref16 ref18 pinggera (ref3) 2014 eigen (ref7) 0 richter (ref2) 2016 fischer (ref19) 2015 badrinarayanan (ref20) 2015 ref24 davies (ref4) 2004 ref26 silberman (ref30) 2012 engel (ref15) 2014 ref28 ref27 chatfield (ref23) 2014 sharma (ref25) 2015 ref8 (ref29) 0 saxena (ref17) 0 ref9 garg (ref11) 2016 ref6 ref5
References_xml	– start-page: 834 year: 2014 ident: ref15 article-title: LSD-SLAM: Large-scale direct monocular slam publication-title: European Conference on Computer Vision – year: 2014 ident: ref23 article-title: Return of the devil in the details: Delving deep into convolutional nets publication-title: arXiv preprint arXiv 1405 3531 – ident: ref5 doi: 10.1109/IROS.2015.7353537 – start-page: 649 year: 2016 ident: ref2 article-title: Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments publication-title: Robotics Research doi: 10.1007/978-3-319-28872-7_37 – ident: ref13 doi: 10.1023/A:1014573219977 – year: 2014 ident: ref21 article-title: Very deep convolutional networks for large-scale image recognition publication-title: arXiv preprint arXiv 1409 1556 – year: 2015 ident: ref20 article-title: Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling publication-title: arXiv preprint arXiv 1505 03561 – ident: ref1 doi: 10.1109/ICRA.2015.7138979 – year: 2004 ident: ref4 publication-title: Machine Vision Theory Algorithms Practicalities – ident: ref18 doi: 10.1109/ICCV.2015.304 – start-page: 746 year: 2012 ident: ref30 article-title: Indoor segmentation and support inference from RGBD images publication-title: Computer Vision – year: 2015 ident: ref19 article-title: Flownet: Learning optical flow with convolutional networks – ident: ref26 doi: 10.1109/CVPR.2016.148 – ident: ref31 doi: 10.1109/TPAMI.2007.1166 – ident: ref6 doi: 10.1109/IROS.2015.7353448 – ident: ref9 doi: 10.1109/CVPR.2016.594 – ident: ref12 doi: 10.1109/IROS.2016.7759632 – ident: ref24 doi: 10.1109/ICASSP.2015.7178838 – start-page: 1161 year: 0 ident: ref17 article-title: Learning depth from single monocular images publication-title: Proc Adv Neural Inf Process Syst – ident: ref27 doi: 10.1162/neco.1997.9.8.1735 – year: 2016 ident: ref11 article-title: Unsupervised CNN for single view depth estimation: Geometry to the rescue doi: 10.1007/978-3-319-46484-8_45 – ident: ref16 doi: 10.1109/ICCV.2011.6126513 – ident: ref14 doi: 10.1109/ICRA.2014.6907233 – start-page: 2366 year: 0 ident: ref7 article-title: Depth map prediction from a single image using a multi-scale deep network publication-title: Proc Adv Neural Inf Process Syst – ident: ref28 doi: 10.1177/0278364913491297 – start-page: 96 year: 2014 ident: ref3 article-title: Know your limits: Accuracy of long range stereoscopic object measurements in practice publication-title: Computer Vision – year: 0 ident: ref29 – ident: ref10 doi: 10.1109/TPAMI.2008.132 – ident: ref8 doi: 10.1109/TPAMI.2015.2505283 – year: 2015 ident: ref25 article-title: Action recognition using visual attention publication-title: arXiv preprint arXiv 1511 05271 – start-page: 248 year: 0 ident: ref22 article-title: Imagenet: A large-scale hierarchical image database publication-title: Proc IEEE Conf Comput Vis Pattern Recog
SSID	ssj0001527395
Score	2.318378
Snippet	Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments....
SourceID	crossref ieee
SourceType	Enrichment Source Index Database Publisher
StartPage	1778
SubjectTerms	Benchmark testing Cameras Collision avoidance Estimation Feature extraction range sensing Streaming media Training Vehicles visual-based navigation
Title	Toward Domain Independence for Learning-Based Monocular Depth Estimation
URI	https://ieeexplore.ieee.org/document/7829276
Volume	2
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB7anvTgq4r1UfbgRTBpnpvkWLWlivUgLfQWkt2JipoUSS8e_O3O5mUREW9hmIVlZjfz2G9mAM4wVrl8aWhSmrHmoMk1ojiaZ0Q2T6SIhKFqh6f3fDJ3bhfuogUXTS0MIhbgM9TVZ_GWLzOxUqmyAVmzwPJ4G9oUuJW1Wt_5FNVJLHDrl0gjGNw9DBV0y9MtBe-o8ia15VkbpVJYkvE2TOs9lACSF32Vx7r4-NGe8b-b3IGtyqVkw_IM7EIL0z3YXGs02IXJrEDHsuvsLXpO2U0z-1YgI6-VVV1WH7VLMmqS0UXPCnwqu8Zl_sRG9B8oSxz3YT4eza4mWjVDQRN0k3ONy9gRNrrkhSRcRlEgPT8JEpP7XGKCvkkqtG3hI4mIKNJxLSk9M7FQSpubaB9AJ81SPARmU6QSWL4piMdxEtv3AovCM07sccSF7MGglm8oqgbjas7Fa1gEGkYQkkZCpZGw0kgPzpsVy7K5xh-8XSXrhq8S89Hv5GPYUItLXO0JdPL3FZ6S95DHfWhPP0f94vB8AftVw9o
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4gHtSDLzTicw9eTCz0uW2PKJKiwMFAwq1pd6dq1JaYcvHXO9sWJMYYb81k2mxmdjuP_WYG4BJjlcuXuialEWs2Glwjiq25emTxRIpI6Kp2eDjiwcS-nzrTGlwva2EQsQCfYUs9Fnf5MhNzlSprkzXzTZevwTrZfccsq7W-Myqql5jvLO4idb89eOwo8JbbMhXAo8qcLGzPyjCVwpb0dmC4WEUJIXltzfO4JT5_NGj87zJ3YbtyKlmn3AV7UMN0H7ZWWg02IBgX-FjWzd6jl5T1l9NvBTLyW1nVZ_VJuyGzJhkd9axAqLIuzvJndkd_grLI8QAmvbvxbaBVUxQ0QWc517iMbWGhQ35IwmUU-dL1Ej8xuMclJugZpETLEh6SiIgibceU0jUSE6W0uIHWIdTTLMUjYBbFKr7pGYJ4bDuxPNc3KUDjxB5HXMgmtBfyDUXVYlxNungLi1BD90PSSKg0ElYaacLV8o1Z2V7jD96GkvWSrxLz8e_kC9gIxsNBOOiPHk5gU32oRNmeQj3_mOMZ-RJ5fF5soS--isXz
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Toward+Domain+Independence+for+Learning-Based+Monocular+Depth+Estimation&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Mancini%2C+Michele&rft.au=Costante%2C+Gabriele&rft.au=Valigi%2C+Paolo&rft.au=Ciarfuglia%2C+Thomas+A.&rft.date=2017-07-01&rft.issn=2377-3766&rft.eissn=2377-3766&rft.volume=2&rft.issue=3&rft.spage=1778&rft.epage=1785&rft_id=info:doi/10.1109%2FLRA.2017.2657002&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LRA_2017_2657002
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon