Monocular Depth Estimation Network Based on Swin Transformer

Abstract Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which r...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 2428; no. 1; pp. 12019 - 12024
Main Authors Yu, Shangbin, Zhang, Renyan, Ma, Shuaiye, Jiang, Xinfang
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which results in deformed depth edges and inaccurate depth values of distant objects in the depth estimation results. Aiming at this problem, we propose a depth estimation network based on Swin Transformer and the encoder-decoder structure. We construct the encoder using the Swin Transformer network, which can encode long-range spatial dependency and extract features on various scales and across different channels. The decoder of the proposed network is in charge of fusing the features from the encoder by the operations of interpolation, concatenation, and convolution. Experiments on KITTI and NYUv2 datasets show that our proposed network can get more accurate depth edges and depth values than the state-of-the-art methods.
AbstractList Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which results in deformed depth edges and inaccurate depth values of distant objects in the depth estimation results. Aiming at this problem, we propose a depth estimation network based on Swin Transformer and the encoder-decoder structure. We construct the encoder using the Swin Transformer network, which can encode long-range spatial dependency and extract features on various scales and across different channels. The decoder of the proposed network is in charge of fusing the features from the encoder by the operations of interpolation, concatenation, and convolution. Experiments on KITTI and NYUv2 datasets show that our proposed network can get more accurate depth edges and depth values than the state-of-the-art methods.
Abstract Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep learning based depth prediction methods extract depth features using small convolutional kernels with small receptive fields, which results in deformed depth edges and inaccurate depth values of distant objects in the depth estimation results. Aiming at this problem, we propose a depth estimation network based on Swin Transformer and the encoder-decoder structure. We construct the encoder using the Swin Transformer network, which can encode long-range spatial dependency and extract features on various scales and across different channels. The decoder of the proposed network is in charge of fusing the features from the encoder by the operations of interpolation, concatenation, and convolution. Experiments on KITTI and NYUv2 datasets show that our proposed network can get more accurate depth edges and depth values than the state-of-the-art methods.
Author Jiang, Xinfang
Yu, Shangbin
Zhang, Renyan
Ma, Shuaiye
Author_xml – sequence: 1
  givenname: Shangbin
  surname: Yu
  fullname: Yu, Shangbin
  organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China
– sequence: 2
  givenname: Renyan
  surname: Zhang
  fullname: Zhang, Renyan
  organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China
– sequence: 3
  givenname: Shuaiye
  surname: Ma
  fullname: Ma, Shuaiye
  organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China
– sequence: 4
  givenname: Xinfang
  surname: Jiang
  fullname: Jiang, Xinfang
  organization: College of Electrical Engineering and Automation, Shandong University of Science and Technology , China
BookMark eNqFkNtLwzAUxoNMcJv-DRZ8E-pyaXMBX3SbN-YFNp9DlibYuTU16Rj-96ZUJoLgeTknJ9_3hfwGoFe5ygBwiuAFgpyPEMtwSnNBRzjD8TiCCEMkDkB_f9Pbz5wfgUEIKwhJLNYHl4-ucnq7Vj6ZmLp5S6ahKTeqKV2VPJlm5_x7cq2CKZK4mO_KKll4VQXr_Mb4Y3Bo1TqYk-8-BK8308X4Lp09396Pr2apJpiLVJiiyKDNmCFKU0sJLxDj-ZIVyKplEdeU6Ng1ZtjkmVpaDlEmtKZEMEQsGYKzLrf27mNrQiNXbuur-KTEjAmUU4Z4VLFOpb0LwRsrax-_4j8lgrJFJVsIsgUiW1QSyQ5VdJLOWbr6J_p_1_kfroeX8fy3UNaFJV8VTHnT
CitedBy_id crossref_primary_10_3390_s23249866
Cites_doi 10.1109/TIP.2018.2877944
ContentType Journal Article
Copyright Published under licence by IOP Publishing Ltd
Published under licence by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: Published under licence by IOP Publishing Ltd
– notice: Published under licence by IOP Publishing Ltd. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID O3W
TSCCA
AAYXX
CITATION
8FD
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
H8D
HCIFZ
L7M
P5Z
P62
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
DOI 10.1088/1742-6596/2428/1/012019
DatabaseName Institute of Physics - IOP eJournals - Open Access
IOPscience (Open Access)
CrossRef
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
Aerospace Database
SciTech Premium Collection
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Technology Collection
Technology Research Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
Aerospace Database
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest One Academic
Advanced Technologies Database with Aerospace
DatabaseTitleList Publicly Available Content Database
CrossRef
Database_xml – sequence: 1
  dbid: O3W
  name: Institute of Physics - IOP eJournals - Open Access
  url: http://iopscience.iop.org/
  sourceTypes:
    Enrichment Source
    Publisher
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 1742-6596
ExternalDocumentID 10_1088_1742_6596_2428_1_012019
JPCS_2428_1_012019
GroupedDBID 1JI
29L
2WC
4.4
5B3
5GY
5PX
5VS
7.Q
AAJIO
AAJKP
ABHWH
ACAFW
ACHIP
AEFHF
AEJGL
AFKRA
AFYNE
AIYBF
AKPSB
ALMA_UNASSIGNED_HOLDINGS
ARAPS
ASPBG
ATQHT
AVWKF
AZFZN
BENPR
BGLVJ
CCPQU
CEBXE
CJUJL
CRLBU
CS3
DU5
E3Z
EBS
EDWGO
EQZZN
F5P
FRP
GROUPED_DOAJ
GX1
HCIFZ
HH5
IJHAN
IOP
IZVLO
J9A
KNG
KQ8
LAP
N5L
N9A
O3W
OK1
P2P
PIMPY
PJBAE
RIN
RNS
RO9
ROL
SY9
T37
TR2
TSCCA
UCJ
W28
XSB
~02
AAYXX
CITATION
8FD
8FE
8FG
ABUWG
AZQEC
DWQXO
H8D
L7M
P62
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c3289-9edd40f47e3ac6f638d1785b7d1fabd7e363cbd7c272e54abf80149cc639713f3
IEDL.DBID O3W
ISSN 1742-6588
IngestDate Thu Oct 10 20:44:24 EDT 2024
Fri Aug 23 00:59:28 EDT 2024
Sun Mar 05 01:05:05 EST 2023
Wed Aug 21 03:35:22 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3289-9edd40f47e3ac6f638d1785b7d1fabd7e363cbd7c272e54abf80149cc639713f3
OpenAccessLink https://iopscience.iop.org/article/10.1088/1742-6596/2428/1/012019
PQID 2779156718
PQPubID 4998668
PageCount 6
ParticipantIDs iop_journals_10_1088_1742_6596_2428_1_012019
proquest_journals_2779156718
crossref_primary_10_1088_1742_6596_2428_1_012019
PublicationCentury 2000
PublicationDate 20230201
PublicationDateYYYYMMDD 2023-02-01
PublicationDate_xml – month: 02
  year: 2023
  text: 20230201
  day: 01
PublicationDecade 2020
PublicationPlace Bristol
PublicationPlace_xml – name: Bristol
PublicationTitle Journal of physics. Conference series
PublicationTitleAlternate J. Phys.: Conf. Ser
PublicationYear 2023
Publisher IOP Publishing
Publisher_xml – name: IOP Publishing
References Liu (JPCS_2428_1_012019bib2) 2021
Geiger (JPCS_2428_1_012019bib7) 2012
Huang (JPCS_2428_1_012019bib11) 2017
Ronneberger (JPCS_2428_1_012019bib3) 2015
Laina (JPCS_2428_1_012019bib4) 2016
Eigen (JPCS_2428_1_012019bib1) 2014
Cao (JPCS_2428_1_012019bib10) 2018
Lee (JPCS_2428_1_012019bib5) 2019
Silberman (JPCS_2428_1_012019bib6) 2012
Ranftl (JPCS_2428_1_012019bib8) 2021
Masoumian (JPCS_2428_1_012019bib9) 2021
References_xml – year: 2021
  ident: JPCS_2428_1_012019bib9
  article-title: Gcndepth: Self-supervised monocular depth estimation based on graph convolutional network
  contributor:
    fullname: Masoumian
– year: 2019
  ident: JPCS_2428_1_012019bib5
  article-title: From big to small: Multi-scale local planar guidance for monocular depth estimation
  contributor:
    fullname: Lee
– start-page: 3354
  year: 2012
  ident: JPCS_2428_1_012019bib7
  article-title: Are we ready for autonomous driving? the kitti vision benchmark suite
  contributor:
    fullname: Geiger
– start-page: 2366
  year: 2014
  ident: JPCS_2428_1_012019bib1
  article-title: Depth map prediction from a single image using a multi- scale deep network
  contributor:
    fullname: Eigen
– start-page: 234
  year: 2015
  ident: JPCS_2428_1_012019bib3
  article-title: U-net: Convolutional networks for biomedical-image segmentation
  contributor:
    fullname: Ronneberger
– start-page: 10012
  year: 2021
  ident: JPCS_2428_1_012019bib2
  article-title: Swin transformer: Hierarchical vision transformer using shifted windows
  contributor:
    fullname: Liu
– start-page: 239
  year: 2016
  ident: JPCS_2428_1_012019bib4
  article-title: Tombari F and Navab N, Deeper depth prediction with fully convolutional residual networks
  contributor:
    fullname: Laina
– start-page: 12179
  year: 2021
  ident: JPCS_2428_1_012019bib8
  article-title: Vision transformers for dense prediction
  contributor:
    fullname: Ranftl
– year: 2018
  ident: JPCS_2428_1_012019bib10
  article-title: Monocular depth estimation with augmented ordinal depth relationships
  doi: 10.1109/TIP.2018.2877944
  contributor:
    fullname: Cao
– start-page: 4700
  year: 2017
  ident: JPCS_2428_1_012019bib11
  article-title: Densely connected convolutional networks
  contributor:
    fullname: Huang
– start-page: 746
  year: 2012
  ident: JPCS_2428_1_012019bib6
  article-title: Indoor segmentation and support inference from r-gbd images
  contributor:
    fullname: Silberman
SSID ssj0033337
Score 2.3652577
Snippet Abstract Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While...
Estimating depth from a single image is challenging because a single 2D image may correspond to many different 3D scenes with the same depth. While most deep...
SourceID proquest
crossref
iop
SourceType Aggregation Database
Enrichment Source
Publisher
StartPage 12019
SubjectTerms Coders
Encoders-Decoders
Estimation
Feature extraction
Interpolation
Physics
Transformers
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1LS8NAEF5si-BFfGK1SkCPLs0mmxcIYmtL6aEU20JvS_aFvSSxrfj3nU021CJoLiGTOc3uvHa-nUHogZCUKMoTrOLQxZQTgYEgsfRjSRPlUa1LgOwkHC3oeBks7YHbxsIqa5tYGmqZC3NG3vWiKIFcA0zpc_GBzdQoU121IzQaqOVBpuA2Uas3mEzfalvswxNVVyI9DL42rhFekPZZWhJ2wUvBZ9dcIzUNd374p8YqL34Z6dLzDE_QsQ0ZnZdqjU_RgcrO0GEJ3RSbc_QEapmXaFLnVRXbd2cAWltdSHQmFcjb6YGvkg4QZl-rzJnXwapaX6DFcDDvj7CdiYCFD7kRTpSU1NU0Un4qQg3aI0kUBzySRKdcAjn0BbyFF3kqoCnXpj1MIoQp4BFf-5eomeWZukJOquIk1oJHroKYyBUQuRJX60BomfqcpG3k1pJgRdX6gpUl6zhmRnjMCI8Z4THCKuG10SNIjFk12PzPfr_HPp72Z_scrJC6jTr1AuxYd9vh-u_fN-jITIivgNYd1NyuP9UtxBFbfmc3yzeqRMA8
  priority: 102
  providerName: ProQuest
Title Monocular Depth Estimation Network Based on Swin Transformer
URI https://iopscience.iop.org/article/10.1088/1742-6596/2428/1/012019
https://www.proquest.com/docview/2779156718
Volume 2428
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8EYuLF-BlRJEv06GRd99ElXgRB5ABEIHJr1q_oBRbA-O_7um5RYoxxl63NW9f81tf3mv7eK0LXGKdYBTxxFY08N-BYuFAhXUmoDBLlB1rnBNlh1J8Fg3k4_x4Ls8yKqf8WHm2iYAthQYijLfChfTcKk6gF5gWKLRP_aTJ_1ojZNIMxPSIv5WxM4IptUKR5idKS4_V7Q1sWqgK9-DFN57and4D2C6fRubddPEQ7anGEdnPyplgfoztQzGXOJ3UeVLZ5dbqgtzYk0RlamrfTBmslHaiYfLwtnGnprqrVCZr1utNO3y1ORXAFgdWRmygpA08HsSKpiDToj8QxDXkssU65hOqICLgLP_ZVGKRcmwQxiRBmCw8TTU5RdbFcqDPkpIomVAseewq8Ik-A74o9rUOhZUo4TuvIK5FgmU1-wfJNa0qZAY8Z8JgBj2FmwaujG0CMFYqw_lv8akt8MO5MtiVYJnUdNcof8CXqx3ECi0-wref_--YF2jNnxlvqdQNVN6t3dQmexYY3UYX2Hpuo1u4Ox89QehqNm_lw-gQiX8KT
link.rule.ids 315,783,787,12777,21400,27936,27937,33385,33756,38877,38902,43612,43817,53854,53880
linkProvider IOP Publishing
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1LS8NAEF5sRfQiPrFaNaBHl2azmxcIorW11lqEttDbkuwDe0liW_HvO5sHtQiaS8hkTrM7j935Zgaha0IiolgcYhV4NmYxERgIEksaSBYqh2mdA2SHXm_C-lN3Wl64LUpYZWUTc0MtU2HuyFuO74dw1gBTepd9YDM1ymRXyxEaNbTJKPhqUynefaosMYXHLwoiHQyeNqjwXXDoK2mh1wIfBZ8tU0Rq2u388E61WZr9MtG53-nuod0yYLTuixXeRxsqOUBbOXBTLA7RLShlmmNJrUeVLd-tDuhsUY5oDQuIt_UAnkpaQBh9zRJrXIWqan6EJt3OuN3D5UQELCicjHCopGS2Zr6ikfA06I4kfuDGviQ6iiWQPSrgLRzfUS6LYm2aw4RCmPQdoZoeo3qSJuoEWZEKwkCL2LcVRES2gLiV2Fq7QsuIxiRqILuSBM-Kxhc8T1gHATfC40Z43AiPE14Ir4FuQGK8VILF_-xXa-z9t_ZonYNnUjdQs1qAFetqM5z-_fsSbffGrwM-eB6-nKEdMyu-gFw3UX05_1TnEFEs44t823wDgO3Bxw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED90ovgifuJ0akEfrWuafqTgi24Wv5iDbehbaPOBvnRFJ_77XppWGSJiX9qGaxp-yeUu5HcXgBNCMqKCPHEVizw3yIlwsUC6kjIZJMoPtK4IsoPoehLcPoVPC5B-xcJMy3rqP8NHmyjYQlgT4lgXfWjfjcIk6qJ5wdeuif8kSbeUehGWQpPdBMf1A31sZmSKV2wDI82HjDU8r98rm7NSi9iSH1N1ZX_SdVirHUfnwjZzAxZUsQnLFYFTvG3BOSrntOKUOn1Vzp6dK9RdG5boDCzV27lEiyUdLBh9vBTOuHFZ1es2TNKrce_arU9GcAXFFZKbKCkDTwexopmINOqQJDEL81gSneUSiyMq8C782FdhkOXaJIlJhDDbeIRqugOtYlqoXXAyxRKmRR57Cj0jT6D_SjytQ6FlRnOStcFrkOClTYDBq41rxrgBjxvwuAGPE27Ba8MpIsZrZXj7W_x4Tvx22BvNS3Ds2jZ0mg74FvXjOMEFKNrXvf_98whWhv2U398M7vZh1Rwhb5nYHWjNXt_VAToas_ywGkWfPb3DBw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Monocular+Depth+Estimation+Network+Based+on+Swin+Transformer&rft.jtitle=Journal+of+physics.+Conference+series&rft.au=Yu%2C+Shangbin&rft.au=Zhang%2C+Renyan&rft.au=Ma%2C+Shuaiye&rft.au=Jiang%2C+Xinfang&rft.date=2023-02-01&rft.pub=IOP+Publishing&rft.issn=1742-6588&rft.eissn=1742-6596&rft.volume=2428&rft.issue=1&rft_id=info:doi/10.1088%2F1742-6596%2F2428%2F1%2F012019&rft.externalDocID=JPCS_2428_1_012019
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1742-6588&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1742-6588&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1742-6588&client=summon