Monocular Depth Estimation Network Based on Swin Transformer
Abstract Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which r...
Saved in:
Published in | Journal of physics. Conference series Vol. 2428; no. 1; pp. 12019 - 12024 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Bristol
IOP Publishing
01.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Abstract
Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which results in deformed depth edges and inaccurate depth values of distant objects in the depth estimation results. Aiming at this problem, we propose a depth estimation network based on Swin Transformer and the encoder-decoder structure. We construct the encoder using the Swin Transformer network, which can encode long-range spatial dependency and extract features on various scales and across different channels. The decoder of the proposed network is in charge of fusing the features from the encoder by the operations of interpolation, concatenation, and convolution. Experiments on KITTI and NYUv2 datasets show that our proposed network can get more accurate depth edges and depth values than the state-of-the-art methods. |
---|---|
AbstractList | Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which results in deformed depth edges and inaccurate depth values of distant objects in the depth estimation results. Aiming at this problem, we propose a depth estimation network based on Swin Transformer and the encoder-decoder structure. We construct the encoder using the Swin Transformer network, which can encode long-range spatial dependency and extract features on various scales and across different channels. The decoder of the proposed network is in charge of fusing the features from the encoder by the operations of interpolation, concatenation, and convolution. Experiments on KITTI and NYUv2 datasets show that our proposed network can get more accurate depth edges and depth values than the state-of-the-art methods. Abstract Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which results in deformed depth edges and inaccurate depth values of distant objects in the depth estimation results. Aiming at this problem, we propose a depth estimation network based on Swin Transformer and the encoder-decoder structure. We construct the encoder using the Swin Transformer network, which can encode long-range spatial dependency and extract features on various scales and across different channels. The decoder of the proposed network is in charge of fusing the features from the encoder by the operations of interpolation, concatenation, and convolution. Experiments on KITTI and NYUv2 datasets show that our proposed network can get more accurate depth edges and depth values than the state-of-the-art methods. |
Author | Jiang, Xinfang Yu, Shangbin Zhang, Renyan Ma, Shuaiye |
Author_xml | – sequence: 1 givenname: Shangbin surname: Yu fullname: Yu, Shangbin organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China – sequence: 2 givenname: Renyan surname: Zhang fullname: Zhang, Renyan organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China – sequence: 3 givenname: Shuaiye surname: Ma fullname: Ma, Shuaiye organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China – sequence: 4 givenname: Xinfang surname: Jiang fullname: Jiang, Xinfang organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China |
BookMark | eNqFkNtLwzAUxoNMcJv-DRZ8E-pyaXMBX3SbN-YFNp9DlibYuTU16Rj-96ZUJoLgeTknJ9_3hfwGoFe5ygBwiuAFgpyPEMtwSnNBRzjD8TiCCEMkDkB_f9Pbz5wfgUEIKwhJLNYHl4-ucnq7Vj6ZmLp5S6ahKTeqKV2VPJlm5_x7cq2CKZK4mO_KKll4VQXr_Mb4Y3Bo1TqYk-8-BK8308X4Lp09396Pr2apJpiLVJiiyKDNmCFKU0sJLxDj-ZIVyKplEdeU6Ng1ZtjkmVpaDlEmtKZEMEQsGYKzLrf27mNrQiNXbuur-KTEjAmUU4Z4VLFOpb0LwRsrax-_4j8lgrJFJVsIsgUiW1QSyQ5VdJLOWbr6J_p_1_kfroeX8fy3UNaFJV8VTHnT |
CitedBy_id | crossref_primary_10_3390_s23249866 |
Cites_doi | 10.1109/TIP.2018.2877944 |
ContentType | Journal Article |
Copyright | Published under licence by IOP Publishing Ltd Published under licence by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: Published under licence by IOP Publishing Ltd – notice: Published under licence by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | O3W TSCCA AAYXX CITATION 8FD 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO H8D HCIFZ L7M P5Z P62 PIMPY PQEST PQQKQ PQUKI PRINS |
DOI | 10.1088/1742-6596/2428/1/012019 |
DatabaseName | Institute of Physics - IOP eJournals - Open Access IOPscience (Open Access) CrossRef Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea Aerospace Database SciTech Premium Collection Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China |
DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection Technology Research Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database Aerospace Database ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest One Academic Advanced Technologies Database with Aerospace |
DatabaseTitleList | Publicly Available Content Database CrossRef |
Database_xml | – sequence: 1 dbid: O3W name: Institute of Physics - IOP eJournals - Open Access url: http://iopscience.iop.org/ sourceTypes: Enrichment Source Publisher – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 1742-6596 |
ExternalDocumentID | 10_1088_1742_6596_2428_1_012019 JPCS_2428_1_012019 |
GroupedDBID | 1JI 29L 2WC 4.4 5B3 5GY 5PX 5VS 7.Q AAJIO AAJKP ABHWH ACAFW ACHIP AEFHF AEJGL AFKRA AFYNE AIYBF AKPSB ALMA_UNASSIGNED_HOLDINGS ARAPS ASPBG ATQHT AVWKF AZFZN BENPR BGLVJ CCPQU CEBXE CJUJL CRLBU CS3 DU5 E3Z EBS EDWGO EQZZN F5P FRP GROUPED_DOAJ GX1 HCIFZ HH5 IJHAN IOP IZVLO J9A KNG KQ8 LAP N5L N9A O3W OK1 P2P PIMPY PJBAE RIN RNS RO9 ROL SY9 T37 TR2 TSCCA UCJ W28 XSB ~02 AAYXX CITATION 8FD 8FE 8FG ABUWG AZQEC DWQXO H8D L7M P62 PQEST PQQKQ PQUKI PRINS |
ID | FETCH-LOGICAL-c3289-9edd40f47e3ac6f638d1785b7d1fabd7e363cbd7c272e54abf80149cc639713f3 |
IEDL.DBID | O3W |
ISSN | 1742-6588 |
IngestDate | Thu Oct 10 20:44:24 EDT 2024 Fri Aug 23 00:59:28 EDT 2024 Sun Mar 05 01:05:05 EST 2023 Wed Aug 21 03:35:22 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c3289-9edd40f47e3ac6f638d1785b7d1fabd7e363cbd7c272e54abf80149cc639713f3 |
OpenAccessLink | https://iopscience.iop.org/article/10.1088/1742-6596/2428/1/012019 |
PQID | 2779156718 |
PQPubID | 4998668 |
PageCount | 6 |
ParticipantIDs | iop_journals_10_1088_1742_6596_2428_1_012019 proquest_journals_2779156718 crossref_primary_10_1088_1742_6596_2428_1_012019 |
PublicationCentury | 2000 |
PublicationDate | 20230201 |
PublicationDateYYYYMMDD | 2023-02-01 |
PublicationDate_xml | – month: 02 year: 2023 text: 20230201 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Bristol |
PublicationPlace_xml | – name: Bristol |
PublicationTitle | Journal of physics. Conference series |
PublicationTitleAlternate | J. Phys.: Conf. Ser |
PublicationYear | 2023 |
Publisher | IOP Publishing |
Publisher_xml | – name: IOP Publishing |
References | Liu (JPCS_2428_1_012019bib2) 2021 Geiger (JPCS_2428_1_012019bib7) 2012 Huang (JPCS_2428_1_012019bib11) 2017 Ronneberger (JPCS_2428_1_012019bib3) 2015 Laina (JPCS_2428_1_012019bib4) 2016 Eigen (JPCS_2428_1_012019bib1) 2014 Cao (JPCS_2428_1_012019bib10) 2018 Lee (JPCS_2428_1_012019bib5) 2019 Silberman (JPCS_2428_1_012019bib6) 2012 Ranftl (JPCS_2428_1_012019bib8) 2021 Masoumian (JPCS_2428_1_012019bib9) 2021 |
References_xml | – year: 2021 ident: JPCS_2428_1_012019bib9 article-title: Gcndepth: Self-supervised monocular depth estimation based on graph convolutional network contributor: fullname: Masoumian – year: 2019 ident: JPCS_2428_1_012019bib5 article-title: From big to small: Multi-scale local planar guidance for monocular depth estimation contributor: fullname: Lee – start-page: 3354 year: 2012 ident: JPCS_2428_1_012019bib7 article-title: Are we ready for autonomous driving? the kitti vision benchmark suite contributor: fullname: Geiger – start-page: 2366 year: 2014 ident: JPCS_2428_1_012019bib1 article-title: Depth map prediction from a single image using a multi- scale deep network contributor: fullname: Eigen – start-page: 234 year: 2015 ident: JPCS_2428_1_012019bib3 article-title: U-net: Convolutional networks for biomedical-image segmentation contributor: fullname: Ronneberger – start-page: 10012 year: 2021 ident: JPCS_2428_1_012019bib2 article-title: Swin transformer: Hierarchical vision transformer using shifted windows contributor: fullname: Liu – start-page: 239 year: 2016 ident: JPCS_2428_1_012019bib4 article-title: Tombari F and Navab N, Deeper depth prediction with fully convolutional residual networks contributor: fullname: Laina – start-page: 12179 year: 2021 ident: JPCS_2428_1_012019bib8 article-title: Vision transformers for dense prediction contributor: fullname: Ranftl – year: 2018 ident: JPCS_2428_1_012019bib10 article-title: Monocular depth estimation with augmented ordinal depth relationships doi: 10.1109/TIP.2018.2877944 contributor: fullname: Cao – start-page: 4700 year: 2017 ident: JPCS_2428_1_012019bib11 article-title: Densely connected convolutional networks contributor: fullname: Huang – start-page: 746 year: 2012 ident: JPCS_2428_1_012019bib6 article-title: Indoor segmentation and support inference from r-gbd images contributor: fullname: Silberman |
SSID | ssj0033337 |
Score | 2.3652577 |
Snippet | Abstract
Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While... Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep... |
SourceID | proquest crossref iop |
SourceType | Aggregation Database Enrichment Source Publisher |
StartPage | 12019 |
SubjectTerms | Coders Encoders-Decoders Estimation Feature extraction Interpolation Physics Transformers |
SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1LS8NAEF5si-BFfGK1SkCPLs0mmxcIYmtL6aEU20JvS_aFvSSxrfj3nU021CJoLiGTOc3uvHa-nUHogZCUKMoTrOLQxZQTgYEgsfRjSRPlUa1LgOwkHC3oeBks7YHbxsIqa5tYGmqZC3NG3vWiKIFcA0zpc_GBzdQoU121IzQaqOVBpuA2Uas3mEzfalvswxNVVyI9DL42rhFekPZZWhJ2wUvBZ9dcIzUNd374p8YqL34Z6dLzDE_QsQ0ZnZdqjU_RgcrO0GEJ3RSbc_QEapmXaFLnVRXbd2cAWltdSHQmFcjb6YGvkg4QZl-rzJnXwapaX6DFcDDvj7CdiYCFD7kRTpSU1NU0Un4qQg3aI0kUBzySRKdcAjn0BbyFF3kqoCnXpj1MIoQp4BFf-5eomeWZukJOquIk1oJHroKYyBUQuRJX60BomfqcpG3k1pJgRdX6gpUl6zhmRnjMCI8Z4THCKuG10SNIjFk12PzPfr_HPp72Z_scrJC6jTr1AuxYd9vh-u_fN-jITIivgNYd1NyuP9UtxBFbfmc3yzeqRMA8 priority: 102 providerName: ProQuest |
Title | Monocular Depth Estimation Network Based on Swin Transformer |
URI | https://iopscience.iop.org/article/10.1088/1742-6596/2428/1/012019 https://www.proquest.com/docview/2779156718 |
Volume | 2428 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8EYuLF-BlRJEv06GRd99ElXgRB5ABEIHJr1q_oBRbA-O_7um5RYoxxl63NW9f81tf3mv7eK0LXGKdYBTxxFY08N-BYuFAhXUmoDBLlB1rnBNlh1J8Fg3k4_x4Ls8yKqf8WHm2iYAthQYijLfChfTcKk6gF5gWKLRP_aTJ_1ojZNIMxPSIv5WxM4IptUKR5idKS4_V7Q1sWqgK9-DFN57and4D2C6fRubddPEQ7anGEdnPyplgfoztQzGXOJ3UeVLZ5dbqgtzYk0RlamrfTBmslHaiYfLwtnGnprqrVCZr1utNO3y1ORXAFgdWRmygpA08HsSKpiDToj8QxDXkssU65hOqICLgLP_ZVGKRcmwQxiRBmCw8TTU5RdbFcqDPkpIomVAseewq8Ik-A74o9rUOhZUo4TuvIK5FgmU1-wfJNa0qZAY8Z8JgBj2FmwaujG0CMFYqw_lv8akt8MO5MtiVYJnUdNcof8CXqx3ECi0-wref_--YF2jNnxlvqdQNVN6t3dQmexYY3UYX2Hpuo1u4Ox89QehqNm_lw-gQiX8KT |
link.rule.ids | 315,783,787,12777,21400,27936,27937,33385,33756,38877,38902,43612,43817,53854,53880 |
linkProvider | IOP Publishing |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1LS8NAEF5sRfQiPrFaNaBHl2azmxcIorW11lqEttDbkuwDe0liW_HvO5sHtQiaS8hkTrM7j935Zgaha0IiolgcYhV4NmYxERgIEksaSBYqh2mdA2SHXm_C-lN3Wl64LUpYZWUTc0MtU2HuyFuO74dw1gBTepd9YDM1ymRXyxEaNbTJKPhqUynefaosMYXHLwoiHQyeNqjwXXDoK2mh1wIfBZ8tU0Rq2u388E61WZr9MtG53-nuod0yYLTuixXeRxsqOUBbOXBTLA7RLShlmmNJrUeVLd-tDuhsUY5oDQuIt_UAnkpaQBh9zRJrXIWqan6EJt3OuN3D5UQELCicjHCopGS2Zr6ikfA06I4kfuDGviQ6iiWQPSrgLRzfUS6LYm2aw4RCmPQdoZoeo3qSJuoEWZEKwkCL2LcVRES2gLiV2Fq7QsuIxiRqILuSBM-Kxhc8T1gHATfC40Z43AiPE14Ir4FuQGK8VILF_-xXa-z9t_ZonYNnUjdQs1qAFetqM5z-_fsSbffGrwM-eB6-nKEdMyu-gFw3UX05_1TnEFEs44t823wDgO3Bxw |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED90ovgifuJ0akEfrWuafqTgi24Wv5iDbehbaPOBvnRFJ_77XppWGSJiX9qGaxp-yeUu5HcXgBNCMqKCPHEVizw3yIlwsUC6kjIZJMoPtK4IsoPoehLcPoVPC5B-xcJMy3rqP8NHmyjYQlgT4lgXfWjfjcIk6qJ5wdeuif8kSbeUehGWQpPdBMf1A31sZmSKV2wDI82HjDU8r98rm7NSi9iSH1N1ZX_SdVirHUfnwjZzAxZUsQnLFYFTvG3BOSrntOKUOn1Vzp6dK9RdG5boDCzV27lEiyUdLBh9vBTOuHFZ1es2TNKrce_arU9GcAXFFZKbKCkDTwexopmINOqQJDEL81gSneUSiyMq8C782FdhkOXaJIlJhDDbeIRqugOtYlqoXXAyxRKmRR57Cj0jT6D_SjytQ6FlRnOStcFrkOClTYDBq41rxrgBjxvwuAGPE27Ba8MpIsZrZXj7W_x4Tvx22BvNS3Ds2jZ0mg74FvXjOMEFKNrXvf_98whWhv2U398M7vZh1Rwhb5nYHWjNXt_VAToas_ywGkWfPb3DBw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Monocular+Depth+Estimation+Network+Based+on+Swin+Transformer&rft.jtitle=Journal+of+physics.+Conference+series&rft.au=Yu%2C+Shangbin&rft.au=Zhang%2C+Renyan&rft.au=Ma%2C+Shuaiye&rft.au=Jiang%2C+Xinfang&rft.date=2023-02-01&rft.pub=IOP+Publishing&rft.issn=1742-6588&rft.eissn=1742-6596&rft.volume=2428&rft.issue=1&rft_id=info:doi/10.1088%2F1742-6596%2F2428%2F1%2F012019&rft.externalDocID=JPCS_2428_1_012019 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1742-6588&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1742-6588&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1742-6588&client=summon |