Instance-Aware Multi-Object Self-Supervision for Monocular Depth Prediction
This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF camera motion but also <inl...
Saved in:
Published in | IEEE robotics and automation letters Vol. 7; no. 4; pp. 10962 - 10968 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.10.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF camera motion but also <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few state-of-the-art (SOTA) papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks. |
---|---|
AbstractList | This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only [Formula Omitted]DOF camera motion but also [Formula Omitted]DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few state-of-the-art (SOTA) papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks. This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF camera motion but also <inline-formula><tex-math notation="LaTeX">6-</tex-math></inline-formula>DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few state-of-the-art (SOTA) papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks. This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only 6-DOF camera motion but also 6-DOF moving object instances. Self-supervision is performed by warping the images across a video sequence using depth and scene motion including object instances. One novelty of the proposed method is the use of the multi-head attention of the transformer network that matches moving objects across time and models their interaction and dynamics. This enables accurate and robust pose estimation for each object instance. Most image-to-depth predication frameworks make the assumption of rigid scenes, which largely degrades their performance with respect to dynamic objects. Only a few SOTA papers have accounted for dynamic objects. The proposed method is shown to outperform these methods on standard benchmarks and the impact of the dynamic motion on these benchmarks is exposed. Furthermore, the proposed image-to-depth prediction framework is also shown to be competitive with SOTA video-to-depth prediction frameworks. |
Author | Voicila, Adrian Boulahbal, Houssem Eddine Comport, Andrew I. |
Author_xml | – sequence: 1 givenname: Houssem Eddine orcidid: 0000-0003-4151-5281 surname: Boulahbal fullname: Boulahbal, Houssem Eddine email: boulahbal@unice.fr organization: Renault Software Factory and CNRS-I3S, Côte d'Azur University 2600 Rte des Cretes, Valbonne, France – sequence: 2 givenname: Adrian orcidid: 0000-0002-6079-2885 surname: Voicila fullname: Voicila, Adrian email: adrian.voicila@renault.com organization: Renault Software Factory, 2600 Rte des Crêtes, Valbonne, France – sequence: 3 givenname: Andrew I. surname: Comport fullname: Comport, Andrew I. email: andrew.comport@cnrs.fr organization: CNRS- I3S, Côte d'Azur University 2000 Route des Lucioles BP 121, Sophia Antipolis, France |
BackLink | https://hal.science/hal-03841241$$DView record in HAL |
BookMark | eNpNkM1Lw0AQxRepYK29C14Cnjyk7kd2kz2G-tFiSsXqedlsZ2lKzMZNUvG_NyVSPM0w83uPx7tEo8pVgNA1wTNCsLzP3tIZxZTOGJGR5OQMjSmL45DFQoz-7Rdo2jR7jDHhNGaSj9HLsmpaXRkI02_tIVh1ZVuE63wPpg02UNpw09XgD0VTuCqwzgcrVznTldoHD1C3u-DVw7Ywbf--QudWlw1M_-YEfTw9vs8XYbZ-Xs7TLDSM8jY0iTCQaIZpzmIqKWhpLXCJzVZYK0jcB8UyJjQHAjaReUJyQQTjIGgkBGUTdDf47nSpal98av-jnC7UIs3U8YZZEhEakQPp2duBrb376qBp1d51vurjKRpjnnAqGe8pPFDGu6bxYE-2BKtjw6pvWB0bVn8N95KbQVIAwAmXSRQllLBfgrR2dA |
CODEN | IRALC6 |
CitedBy_id | crossref_primary_10_1109_JSEN_2024_3370821 |
Cites_doi | 10.1109/CVPR42600.2020.00742 10.1609/aaai.v32i1.12257 10.1109/CVPR.2016.90 10.1109/IROS51168.2021.9636075 10.1109/ICCV48922.2021.00482 10.1109/CVPR.2016.350 10.1109/CVPR42600.2020.00481 10.5555/3454287.3455008 10.1109/TPAMI.2019.2930258 10.1109/tpami.2022.3225078 10.1109/3DV53792.2021.00072 10.1007/978-3-030-58529-7_34 10.1109/ICCV48922.2021.01249 10.1609/aaai.v35i3.16281 10.1109/TIP.2003.819861 10.1109/CVPR42600.2020.00256 10.1109/CVPR.2019.01252 10.1109/ICCV.2019.00907 10.1109/ICCV.2013.293 10.1007/978-3-030-58565-5_35 10.1109/CVPR42600.2020.01079 10.1007/s11263-021-01445-z 10.1109/ICCV.2017.322 10.1109/CVPR.2019.00273 10.1109/CVPR.2012.6248074 10.1109/CVPR.2018.00212 10.1109/CVPR46437.2021.00122 10.1109/ICCV.2019.00716 10.1109/CVPR52688.2022.00864 10.1109/ICCV.2019.00393 10.1109/CVPR.2017.700 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 1XC |
DOI | 10.1109/LRA.2022.3194951 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE/IET Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 2377-3766 |
EndPage | 10968 |
ExternalDocumentID | oai_HAL_hal_03841241v1 10_1109_LRA_2022_3194951 9844821 |
Genre | orig-research |
GrantInformation_xml | – fundername: GENCI-IDRIS grantid: 2021-011011931 – fundername: Association Nationale Recherche Technologie grantid: 2019/1649 |
GroupedDBID | 0R~ 97E AAJGR AASAJ ABQJQ ABVLG ACGFS AKJIK ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF KQ8 M43 M~E O9- OCL RIA RIE RIG AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 1XC |
ID | FETCH-LOGICAL-c325t-c86ce8a302b37292ea9ffe590cd6ff61776609712be1ef89b81b61635e6246623 |
IEDL.DBID | RIE |
ISSN | 2377-3766 |
IngestDate | Tue Oct 15 15:48:48 EDT 2024 Thu Oct 10 17:54:41 EDT 2024 Fri Aug 23 01:04:12 EDT 2024 Wed Jun 26 19:25:04 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c325t-c86ce8a302b37292ea9ffe590cd6ff61776609712be1ef89b81b61635e6246623 |
ORCID | 0000-0002-6079-2885 0000-0003-4151-5281 |
PQID | 2705852935 |
PQPubID | 4437225 |
PageCount | 7 |
ParticipantIDs | hal_primary_oai_HAL_hal_03841241v1 proquest_journals_2705852935 ieee_primary_9844821 crossref_primary_10_1109_LRA_2022_3194951 |
PublicationCentury | 2000 |
PublicationDate | 2022-10-01 |
PublicationDateYYYYMMDD | 2022-10-01 |
PublicationDate_xml | – month: 10 year: 2022 text: 2022-10-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE robotics and automation letters |
PublicationTitleAbbrev | LRA |
PublicationYear | 2022 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref35 ref12 ref34 ref15 ref37 ref14 ref30 Eigen (ref7) 2014 ref11 ref33 ref10 Vijayanarasimhan (ref32) 2017 ref17 ref39 ref38 ref19 Wu (ref36) 2019 ref18 Vaswani (ref31) 2017 ref24 ref23 ref26 ref25 ref20 ref41 ref22 Kingma (ref16) 2014 Choi (ref5) 2020 ref28 ref27 McCraith (ref21) 2020 ref8 Chen (ref2) 2018 ref9 ref4 ref3 ref6 Bian (ref1) 2019 ref40 Tan (ref29) 2019 |
References_xml | – ident: ref14 doi: 10.1109/CVPR42600.2020.00742 – start-page: 35 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2019 ident: ref1 article-title: Unsupervised scale-consistent depth and ego-motion learning from monocular video contributor: fullname: Bian – start-page: 2366 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2014 ident: ref7 article-title: Depth map prediction from a single image using a multi-scale deep network contributor: fullname: Eigen – ident: ref39 doi: 10.1609/aaai.v32i1.12257 – ident: ref12 doi: 10.1109/CVPR.2016.90 – ident: ref37 doi: 10.1109/IROS51168.2021.9636075 – ident: ref19 doi: 10.1109/ICCV48922.2021.00482 – ident: ref6 doi: 10.1109/CVPR.2016.350 – ident: ref15 doi: 10.1109/CVPR42600.2020.00481 – volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2017 ident: ref31 article-title: Attention is all you need contributor: fullname: Vaswani – ident: ref24 doi: 10.5555/3454287.3455008 – ident: ref20 doi: 10.1109/TPAMI.2019.2930258 – ident: ref38 doi: 10.1109/tpami.2022.3225078 – ident: ref27 doi: 10.1109/3DV53792.2021.00072 – ident: ref28 doi: 10.1007/978-3-030-58529-7_34 – ident: ref33 doi: 10.1109/ICCV48922.2021.01249 – ident: ref18 doi: 10.1609/aaai.v35i3.16281 – ident: ref34 doi: 10.1109/TIP.2003.819861 – ident: ref26 doi: 10.1109/CVPR42600.2020.00256 – ident: ref25 doi: 10.1109/CVPR.2019.01252 – ident: ref10 doi: 10.1109/ICCV.2019.00907 – ident: ref13 doi: 10.1109/ICCV.2013.293 – ident: ref17 doi: 10.1007/978-3-030-58565-5_35 – volume-title: Proc. 34th Conf. Neural Inf. Process. Syst. year: 2020 ident: ref5 article-title: SAFENet: Self-supervised monocular depth estimation with semantic-aware feature extraction contributor: fullname: Choi – ident: ref30 doi: 10.1109/CVPR42600.2020.01079 – year: 2017 ident: ref32 article-title: SFM-Net: Learning of structure and motion from video contributor: fullname: Vijayanarasimhan – ident: ref23 doi: 10.1007/s11263-021-01445-z – ident: ref11 doi: 10.1109/ICCV.2017.322 – start-page: 8713 volume-title: Proc. Adv. Neural Inf. Process. Syst. year: 2018 ident: ref2 article-title: Searching for efficient multi-scale architectures for dense image prediction contributor: fullname: Chen – ident: ref3 doi: 10.1109/CVPR.2019.00273 – year: 2019 ident: ref36 article-title: Detectron2 contributor: fullname: Wu – ident: ref8 doi: 10.1109/CVPR.2012.6248074 – ident: ref40 doi: 10.1109/CVPR.2018.00212 – ident: ref35 doi: 10.1109/CVPR46437.2021.00122 – year: 2014 ident: ref16 article-title: Adam: A method for stochastic optimization contributor: fullname: Kingma – ident: ref4 doi: 10.1109/ICCV.2019.00716 – year: 2020 ident: ref21 article-title: Monocular depth estimation with self-supervised instance adaptation contributor: fullname: McCraith – ident: ref22 doi: 10.1109/CVPR52688.2022.00864 – ident: ref9 doi: 10.1109/ICCV.2019.00393 – start-page: 6105 volume-title: Proc. Int. Conf. Mach. Learn. year: 2019 ident: ref29 article-title: EfficientNet: Rethinking model scaling for convolutional neural networks contributor: fullname: Tan – ident: ref41 doi: 10.1109/CVPR.2017.700 |
SSID | ssj0001527395 |
Score | 2.2855945 |
Snippet | This letter proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only... This paper proposes a self-supervised monocular image-to-depth prediction framework that is trained with an end-to-end photometric loss that handles not only... |
SourceID | hal proquest crossref ieee |
SourceType | Open Access Repository Aggregation Database Publisher |
StartPage | 10962 |
SubjectTerms | Artificial Intelligence Automatic Benchmarks Cameras Computer Science Computer Vision and Pattern Recognition Depth prediction Dynamics Engineering Sciences Head motion prediction multi-object detection Object motion Performance degradation Pose estimation Proposals Robotics Semantics Signal and Image processing Training Transformers |
Title | Instance-Aware Multi-Object Self-Supervision for Monocular Depth Prediction |
URI | https://ieeexplore.ieee.org/document/9844821 https://www.proquest.com/docview/2705852935 https://hal.science/hal-03841241 |
Volume | 7 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Na9swFH80PW2HtV03ln5hRi-DKVVsS7aOoR9kW9ONdYXejCU_EdhIQuq00EP_9r4nO1m77bCbMTII_ST5975-D-DQZ1pnqStFYlMnUp16YVBmQumcyHWF1mkucB5d6OFV-vlaXa_Bx1UtDCKG5DPs8WOI5VdTt2BX2ZHJyZjgqvFOZkxTq_Xbn8JKYkYtI5HSHJ1_H5D9F8dklhoyA_rP_jydMec9hoYqf93C4ddytgGj5aSajJKfvUVte-7-D73G_531JrxqOWY0aDbFFqzh5DW8fKI8uA1fPgVa6FAM7so5RqEOV3y17JWJLvGXF5eLGV8j7EyLiNhGdPinIWc1OsFZPY6-zTnEw7C-gauz0x_HQ9H2VRAuiVUtXK4d5mUiY8tBuxhL4z0qI12lvSdKQ_ixtFRssY8-N5aorSbeplDHqSa-9BbWJ9MJvoPIGiW5z410KWujJWVWGe9SS5hXqspUFz4s17yYNfIZRTA7pCkIn4LxKVp8uvCeQFkNY93r4eC84HcyyblLdv-WBm3zCq9GtYvbhb0lhkV7CG-KOJNkDBGfUTv__moXXvAEmty8PViv5wvcJ45R2wPojB5OD8IWewQgOc6U |
link.rule.ids | 230,315,783,787,799,888,27938,27939,55088 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Pb9MwFH4a2wE4wGBMlI0tQlyQcOcmsRMfK2DqWDsQ26TdrNh51iRQW5UUJP76veek5dcOu0WRI1n-bOd7v74H8DoUWhe5r0Tmci9ynQdhUBZC6ZLIdY3Oay5wnpzp0WX-8UpdbcDbdS0MIsbkM-zzY4zl1zO_ZFfZkSnJmOCq8S3FvKKt1vrtUWEtMaNWsUhpjsZfhmQBpikZpoYMgcFf_55715z5GFuq_HcPx5_L8WOYrKbV5pR87S8b1_e__lFsvOu8t-FRxzKTYbstnsAGTp_Cwz-0B3fg9CQSQ49i-LNaYBIrccUnx36Z5By_BXG-nPNFwu60hKhtQsd_FrNWk_c4b66TzwsO8jCwz-Dy-MPFu5HoOisIn6WqEb7UHssqk6njsF2KlQkBlZG-1iEQqSEEWVwqdTjAUBpH5FYTc1Oo01wTY9qFzelsis8hcUZJ7nQjfc7qaFlV1Cb43BHqtaoL1YM3qzW381ZAw0bDQxpL-FjGx3b49OAVgbIexsrXo-HY8juZldwne_CDBu3wCq9HdYvbg_0VhrY7ht9tWkgyh4jRqBe3f3UI90cXk7Edn5yd7sEDnkybqbcPm81iiS-JcTTuIG60GzkB0LA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Instance-Aware+Multi-Object+Self-Supervision+for+Monocular+Depth+Prediction&rft.jtitle=IEEE+robotics+and+automation+letters&rft.au=Boulahbal%2C+Houssem+Eddine&rft.au=Voicila%2C+Adrian&rft.au=Comport%2C+Andrew+I.&rft.date=2022-10-01&rft.pub=IEEE&rft.eissn=2377-3766&rft.volume=7&rft.issue=4&rft.spage=10962&rft.epage=10968&rft_id=info:doi/10.1109%2FLRA.2022.3194951&rft.externalDocID=9844821 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2377-3766&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2377-3766&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2377-3766&client=summon |