Instance-Aware Multi-Object Self-Supervision for Monocular Depth Prediction

This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF camera motion but also <inl...

Full description

Saved in:
Bibliographic Details
Published inIEEE robotics and automation letters Vol. 7; no. 4; pp. 10962 - 10968
Main Authors Boulahbal, Houssem Eddine, Voicila, Adrian, Comport, Andrew I.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.10.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF camera motion but also <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few state-of-the-art (SOTA) papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks.
AbstractList This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only [Formula Omitted]DOF camera motion but also [Formula Omitted]DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few state-of-the-art (SOTA) papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks.
This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF camera motion but also <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few state-of-the-art (SOTA) papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks.
This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only 6-DOF camera motion but also 6-DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few SOTA papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks.
Author Voicila, Adrian
Boulahbal, Houssem Eddine
Comport, Andrew I.
Author_xml – sequence: 1
  givenname: Houssem Eddine
  orcidid: 0000-0003-4151-5281
  surname: Boulahbal
  fullname: Boulahbal, Houssem Eddine
  email: boulahbal@unice.fr
  organization: Renault Software Factory and CNRS-I3S, Côte d'Azur University 2600 Rte des Cretes, Valbonne, France
– sequence: 2
  givenname: Adrian
  orcidid: 0000-0002-6079-2885
  surname: Voicila
  fullname: Voicila, Adrian
  email: adrian.voicila@renault.com
  organization: Renault Software Factory, 2600 Rte des Crêtes, Valbonne, France
– sequence: 3
  givenname: Andrew I.
  surname: Comport
  fullname: Comport, Andrew I.
  email: andrew.comport@cnrs.fr
  organization: CNRS- I3S, Côte d'Azur University 2000 Route des Lucioles BP 121, Sophia Antipolis, France
BackLink https://hal.science/hal-03841241$$DView record in HAL
BookMark eNpNkM1Lw0AQxRepYK29C14Cnjyk7kd2kz2G-tFiSsXqedlsZ2lKzMZNUvG_NyVSPM0w83uPx7tEo8pVgNA1wTNCsLzP3tIZxZTOGJGR5OQMjSmL45DFQoz-7Rdo2jR7jDHhNGaSj9HLsmpaXRkI02_tIVh1ZVuE63wPpg02UNpw09XgD0VTuCqwzgcrVznTldoHD1C3u-DVw7Ywbf--QudWlw1M_-YEfTw9vs8XYbZ-Xs7TLDSM8jY0iTCQaIZpzmIqKWhpLXCJzVZYK0jcB8UyJjQHAjaReUJyQQTjIGgkBGUTdDf47nSpal98av-jnC7UIs3U8YZZEhEakQPp2duBrb376qBp1d51vurjKRpjnnAqGe8pPFDGu6bxYE-2BKtjw6pvWB0bVn8N95KbQVIAwAmXSRQllLBfgrR2dA
CODEN IRALC6
CitedBy_id crossref_primary_10_1109_JSEN_2024_3370821
Cites_doi 10.1109/CVPR42600.2020.00742
10.1609/aaai.v32i1.12257
10.1109/CVPR.2016.90
10.1109/IROS51168.2021.9636075
10.1109/ICCV48922.2021.00482
10.1109/CVPR.2016.350
10.1109/CVPR42600.2020.00481
10.5555/3454287.3455008
10.1109/TPAMI.2019.2930258
10.1109/tpami.2022.3225078
10.1109/3DV53792.2021.00072
10.1007/978-3-030-58529-7_34
10.1109/ICCV48922.2021.01249
10.1609/aaai.v35i3.16281
10.1109/TIP.2003.819861
10.1109/CVPR42600.2020.00256
10.1109/CVPR.2019.01252
10.1109/ICCV.2019.00907
10.1109/ICCV.2013.293
10.1007/978-3-030-58565-5_35
10.1109/CVPR42600.2020.01079
10.1007/s11263-021-01445-z
10.1109/ICCV.2017.322
10.1109/CVPR.2019.00273
10.1109/CVPR.2012.6248074
10.1109/CVPR.2018.00212
10.1109/CVPR46437.2021.00122
10.1109/ICCV.2019.00716
10.1109/CVPR52688.2022.00864
10.1109/ICCV.2019.00393
10.1109/CVPR.2017.700
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
1XC
DOI 10.1109/LRA.2022.3194951
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE/IET Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database


Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2377-3766
EndPage 10968
ExternalDocumentID oai_HAL_hal_03841241v1
10_1109_LRA_2022_3194951
9844821
Genre orig-research
GrantInformation_xml – fundername: GENCI-IDRIS
  grantid: 2021-011011931
– fundername: Association Nationale Recherche Technologie
  grantid: 2019/1649
GroupedDBID 0R~
97E
AAJGR
AASAJ
ABQJQ
ABVLG
ACGFS
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
RIA
RIE
RIG
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
1XC
ID FETCH-LOGICAL-c325t-c86ce8a302b37292ea9ffe590cd6ff61776609712be1ef89b81b61635e6246623
IEDL.DBID RIE
ISSN 2377-3766
IngestDate Tue Oct 15 15:48:48 EDT 2024
Thu Oct 10 17:54:41 EDT 2024
Fri Aug 23 01:04:12 EDT 2024
Wed Jun 26 19:25:04 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c325t-c86ce8a302b37292ea9ffe590cd6ff61776609712be1ef89b81b61635e6246623
ORCID 0000-0002-6079-2885
0000-0003-4151-5281
PQID 2705852935
PQPubID 4437225
PageCount 7
ParticipantIDs hal_primary_oai_HAL_hal_03841241v1
proquest_journals_2705852935
ieee_primary_9844821
crossref_primary_10_1109_LRA_2022_3194951
PublicationCentury 2000
PublicationDate 2022-10-01
PublicationDateYYYYMMDD 2022-10-01
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-10-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE robotics and automation letters
PublicationTitleAbbrev LRA
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
ref12
ref34
ref15
ref37
ref14
ref30
Eigen (ref7) 2014
ref11
ref33
ref10
Vijayanarasimhan (ref32) 2017
ref17
ref39
ref38
ref19
Wu (ref36) 2019
ref18
Vaswani (ref31) 2017
ref24
ref23
ref26
ref25
ref20
ref41
ref22
Kingma (ref16) 2014
Choi (ref5) 2020
ref28
ref27
McCraith (ref21) 2020
ref8
Chen (ref2) 2018
ref9
ref4
ref3
ref6
Bian (ref1) 2019
ref40
Tan (ref29) 2019
References_xml – ident: ref14
  doi: 10.1109/CVPR42600.2020.00742
– start-page: 35
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2019
  ident: ref1
  article-title: Unsupervised scale-consistent depth and ego-motion learning from monocular video
  contributor:
    fullname: Bian
– start-page: 2366
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2014
  ident: ref7
  article-title: Depth map prediction from a single image using a multi-scale deep network
  contributor:
    fullname: Eigen
– ident: ref39
  doi: 10.1609/aaai.v32i1.12257
– ident: ref12
  doi: 10.1109/CVPR.2016.90
– ident: ref37
  doi: 10.1109/IROS51168.2021.9636075
– ident: ref19
  doi: 10.1109/ICCV48922.2021.00482
– ident: ref6
  doi: 10.1109/CVPR.2016.350
– ident: ref15
  doi: 10.1109/CVPR42600.2020.00481
– volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2017
  ident: ref31
  article-title: Attention is all you need
  contributor:
    fullname: Vaswani
– ident: ref24
  doi: 10.5555/3454287.3455008
– ident: ref20
  doi: 10.1109/TPAMI.2019.2930258
– ident: ref38
  doi: 10.1109/tpami.2022.3225078
– ident: ref27
  doi: 10.1109/3DV53792.2021.00072
– ident: ref28
  doi: 10.1007/978-3-030-58529-7_34
– ident: ref33
  doi: 10.1109/ICCV48922.2021.01249
– ident: ref18
  doi: 10.1609/aaai.v35i3.16281
– ident: ref34
  doi: 10.1109/TIP.2003.819861
– ident: ref26
  doi: 10.1109/CVPR42600.2020.00256
– ident: ref25
  doi: 10.1109/CVPR.2019.01252
– ident: ref10
  doi: 10.1109/ICCV.2019.00907
– ident: ref13
  doi: 10.1109/ICCV.2013.293
– ident: ref17
  doi: 10.1007/978-3-030-58565-5_35
– volume-title: Proc. 34th Conf. Neural Inf. Process. Syst.
  year: 2020
  ident: ref5
  article-title: SAFENet: Self-supervised monocular depth estimation with semantic-aware feature extraction
  contributor:
    fullname: Choi
– ident: ref30
  doi: 10.1109/CVPR42600.2020.01079
– year: 2017
  ident: ref32
  article-title: SFM-Net: Learning of structure and motion from video
  contributor:
    fullname: Vijayanarasimhan
– ident: ref23
  doi: 10.1007/s11263-021-01445-z
– ident: ref11
  doi: 10.1109/ICCV.2017.322
– start-page: 8713
  volume-title: Proc. Adv. Neural Inf. Process. Syst.
  year: 2018
  ident: ref2
  article-title: Searching for efficient multi-scale architectures for dense image prediction
  contributor:
    fullname: Chen
– ident: ref3
  doi: 10.1109/CVPR.2019.00273
– year: 2019
  ident: ref36
  article-title: Detectron2
  contributor:
    fullname: Wu
– ident: ref8
  doi: 10.1109/CVPR.2012.6248074
– ident: ref40
  doi: 10.1109/CVPR.2018.00212
– ident: ref35
  doi: 10.1109/CVPR46437.2021.00122
– year: 2014
  ident: ref16
  article-title: Adam: A method for stochastic optimization
  contributor:
    fullname: Kingma
– ident: ref4
  doi: 10.1109/ICCV.2019.00716
– year: 2020
  ident: ref21
  article-title: Monocular depth estimation with self-supervised instance adaptation
  contributor:
    fullname: McCraith
– ident: ref22
  doi: 10.1109/CVPR52688.2022.00864
– ident: ref9
  doi: 10.1109/ICCV.2019.00393
– start-page: 6105
  volume-title: Proc. Int. Conf. Mach. Learn.
  year: 2019
  ident: ref29
  article-title: EfficientNet: Rethinking model scaling for convolutional neural networks
  contributor:
    fullname: Tan
– ident: ref41
  doi: 10.1109/CVPR.2017.700
SSID ssj0001527395
Score 2.2855945
Snippet This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only...
This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only...
SourceID hal
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Publisher
StartPage 10962
SubjectTerms Artificial Intelligence
Automatic
Benchmarks
Cameras
Computer Science
Computer Vision and Pattern Recognition
Depth prediction
Dynamics
Engineering Sciences
Head
motion prediction
multi-object detection
Object motion
Performance degradation
Pose estimation
Proposals
Robotics
Semantics
Signal and Image processing
Training
Transformers
Title Instance-Aware Multi-Object Self-Supervision for Monocular Depth Prediction
URI https://ieeexplore.ieee.org/document/9844821
https://www.proquest.com/docview/2705852935
https://hal.science/hal-03841241
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Na9swFH80PW2HtV03ln5hRi-DKVVsS7aOoR9kW9ONdYXejCU_EdhIQuq00EP_9r4nO1m77bCbMTII_ST5975-D-DQZ1pnqStFYlMnUp16YVBmQumcyHWF1mkucB5d6OFV-vlaXa_Bx1UtDCKG5DPs8WOI5VdTt2BX2ZHJyZjgqvFOZkxTq_Xbn8JKYkYtI5HSHJ1_H5D9F8dklhoyA_rP_jydMec9hoYqf93C4ddytgGj5aSajJKfvUVte-7-D73G_531JrxqOWY0aDbFFqzh5DW8fKI8uA1fPgVa6FAM7so5RqEOV3y17JWJLvGXF5eLGV8j7EyLiNhGdPinIWc1OsFZPY6-zTnEw7C-gauz0x_HQ9H2VRAuiVUtXK4d5mUiY8tBuxhL4z0qI12lvSdKQ_ixtFRssY8-N5aorSbeplDHqSa-9BbWJ9MJvoPIGiW5z410KWujJWVWGe9SS5hXqspUFz4s17yYNfIZRTA7pCkIn4LxKVp8uvCeQFkNY93r4eC84HcyyblLdv-WBm3zCq9GtYvbhb0lhkV7CG-KOJNkDBGfUTv__moXXvAEmty8PViv5wvcJ45R2wPojB5OD8IWewQgOc6U
link.rule.ids 230,315,783,787,799,888,27938,27939,55088
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Pb9MwFH4a2wE4wGBMlI0tQlyQcOcmsRMfK2DqWDsQ26TdrNh51iRQW5UUJP76veek5dcOu0WRI1n-bOd7v74H8DoUWhe5r0Tmci9ynQdhUBZC6ZLIdY3Oay5wnpzp0WX-8UpdbcDbdS0MIsbkM-zzY4zl1zO_ZFfZkSnJmOCq8S3FvKKt1vrtUWEtMaNWsUhpjsZfhmQBpikZpoYMgcFf_55715z5GFuq_HcPx5_L8WOYrKbV5pR87S8b1_e__lFsvOu8t-FRxzKTYbstnsAGTp_Cwz-0B3fg9CQSQ49i-LNaYBIrccUnx36Z5By_BXG-nPNFwu60hKhtQsd_FrNWk_c4b66TzwsO8jCwz-Dy-MPFu5HoOisIn6WqEb7UHssqk6njsF2KlQkBlZG-1iEQqSEEWVwqdTjAUBpH5FYTc1Oo01wTY9qFzelsis8hcUZJ7nQjfc7qaFlV1Cb43BHqtaoL1YM3qzW381ZAw0bDQxpL-FjGx3b49OAVgbIexsrXo-HY8juZldwne_CDBu3wCq9HdYvbg_0VhrY7ht9tWkgyh4jRqBe3f3UI90cXk7Edn5yd7sEDnkybqbcPm81iiS-JcTTuIG60GzkB0LA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Instance-Aware+Multi-Object+Self-Supervision+for+Monocular+Depth+Prediction&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Boulahbal%2C+Houssem+Eddine&rft.au=Voicila%2C+Adrian&rft.au=Comport%2C+Andrew+I.&rft.date=2022-10-01&rft.pub=IEEE&rft.eissn=2377-3766&rft.volume=7&rft.issue=4&rft.spage=10962&rft.epage=10968&rft_id=info:doi/10.1109%2FLRA.2022.3194951&rft.externalDocID=9844821
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon