Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion

Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers t...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 14; no. 1; p. 7037
Main Authors Xia, Zhongyi, Wu, Tianzhao, Wang, Zhuoyan, Zhou, Man, Wu, Boqi, Chan, C. Y., Kong, Ling Bing
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 25.03.2024
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.
AbstractList Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.
Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.
Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE.
ArticleNumber 7037
Author Wu, Tianzhao
Wu, Boqi
Xia, Zhongyi
Chan, C. Y.
Wang, Zhuoyan
Kong, Ling Bing
Zhou, Man
Author_xml – sequence: 1
  givenname: Zhongyi
  surname: Xia
  fullname: Xia, Zhongyi
  organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University
– sequence: 2
  givenname: Tianzhao
  surname: Wu
  fullname: Wu, Tianzhao
  organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University
– sequence: 3
  givenname: Zhuoyan
  surname: Wang
  fullname: Wang, Zhuoyan
  organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University
– sequence: 4
  givenname: Man
  surname: Zhou
  fullname: Zhou, Man
  organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University
– sequence: 5
  givenname: Boqi
  surname: Wu
  fullname: Wu, Boqi
  organization: Jilin Jianzhu University
– sequence: 6
  givenname: C. Y.
  surname: Chan
  fullname: Chan, C. Y.
  email: chenzengyuan@sztu.edu.cn
  organization: College of New Materials and New Energies, Shenzhen Technology University
– sequence: 7
  givenname: Ling Bing
  surname: Kong
  fullname: Kong, Ling Bing
  email: konglingbing@sztu.edu.cn
  organization: College of New Materials and New Energies, Shenzhen Technology University
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38528098$$D View this record in MEDLINE/PubMed
BookMark eNp9kstu1TAQhi1URC_0BVggS2zYBHyPvUKoQKlUiQ2sLceZnPoosQ92Uql9-vo0pbQs8GasmX8-e0b_MTqIKQJCbyj5QAnXH4ug0uiGMNHI1hDd3L5AR4wI2TDO2MGT-yE6LWVL6pHMCGpeoUOuJdPE6CM0f4FYAE8pJr-MLuMedvMVhjKHyc0hRTykjMsMGVLxaRc8vg5ln-9cgR7Xy-4muyn0eM4ulqqeIGMXezwt4xya4t0IeAA3L7nGZd_7Gr0c3Fjg9CGeoF_fvv48-95c_ji_OPt82Xhh2NwocEYJZ6iAgbnWOcO9ViAlUV4xCUB554nxqgM_eOK5AD4YTgxXVA5G8hN0sXL75LZ2l-tI-cYmF-x9IuWNdXkOfgTbgu5Ub5RqWy0G1WkKygvSe-eUEcJU1qeVtVu6CXoPsc47PoM-r8RwZTfp2lJiFG-VqoT3D4Scfi91w3YKxcM4ughpKZYTKgjRStIqffePdJuWHOuuLDNaCkIN1VXFVpXPqZQMw-NvKLF7k9jVJLaaxN6bxN7WprdP53hs-WOJKuCroNRS3ED--_Z_sHcEesyM
Cites_doi 10.1109/ICCV48922.2021.00614
10.1016/j.engappai.2022.105587
10.1109/WACV.2019.00116
10.1016/j.visres.2022.108058
10.1145/3429341.3429355
10.1145/3065386
10.1109/CVPR46437.2021.00956
10.1109/CVPR.2016.90
10.1109/TCSVT.2018.2825022
10.1109/CVPR52729.2023.01778
10.1109/ICCV48922.2021.00986
10.1007/s13042-020-01251-y
10.1109/3DV.2016.69
10.1007/s41095-022-0274-8
10.1007/978-3-319-24574-4_28
10.1109/ICCV48922.2021.01527
10.1109/ICCV.2019.00393
10.1007/s11801-016-6052-z
10.1109/TCSVT.2021.3049869
10.1007/s11760-022-02302-3
10.1016/j.knosys.2023.110301
10.1007/978-3-031-46305-1_7
10.1109/CVPRW56347.2022.00309
10.1109/ICASSP40776.2020.9053405
10.1109/TPAMI.2021.3136220
10.1109/CVPR.2016.438
10.1109/ICCV48922.2021.01196
10.1016/j.ijleo.2022.169942
10.1109/ICCV48922.2021.00061
10.1109/ICCV.2015.178
10.1080/10556788.2022.2117355
10.1007/s10489-022-03303-y
10.1109/TCSVT.2022.3207105
ContentType Journal Article
Copyright The Author(s) 2024
2024. The Author(s).
The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2024
– notice: 2024. The Author(s).
– notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
NPM
AAYXX
CITATION
3V.
7X7
7XB
88A
88E
88I
8FE
8FH
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
FYUFA
GHDGH
GNUQQ
HCIFZ
K9.
LK8
M0S
M1P
M2P
M7P
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
Q9U
7X8
5PM
DOA
DOI 10.1038/s41598-024-57908-z
DatabaseName Springer_OA刊
PubMed
CrossRef
ProQuest Central (Corporate)
Proquest Health & Medical Complete
ProQuest Central (purchase pre-March 2016)
Biology Database (Alumni Edition)
Medical Database (Alumni Edition)
Science Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni Edition)
ProQuest Central
ProQuest Central Essentials
Biological Science Collection
AUTh Library subscriptions: ProQuest Central
ProQuest Natural Science Collection
ProQuest One Community College
ProQuest Central
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Health & Medical Complete (Alumni)
Biological Sciences
Health & Medical Collection (Alumni Edition)
PML(ProQuest Medical Library)
ProQuest Science Journals
Biological Science Database
Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle PubMed
CrossRef
Publicly Available Content Database
ProQuest Central Student
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Biology Journals (Alumni Edition)
ProQuest Central
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Medical Library (Alumni)
ProQuest Science Journals (Alumni Edition)
ProQuest Biological Science Collection
ProQuest Central Basic
ProQuest Science Journals
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
Biological Science Database
ProQuest SciTech Collection
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest Medical Library
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList CrossRef



PubMed
Publicly Available Content Database
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: C6C
  name: Springer_OA刊
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 4
  dbid: BENPR
  name: AUTh Library subscriptions: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2045-2322
EndPage 7037
ExternalDocumentID oai_doaj_org_article_7e8b6d9667784f6b81e6c40dcaa69449
10_1038_s41598_024_57908_z
38528098
Genre Journal Article
GroupedDBID 0R~
3V.
4.4
53G
5VS
7X7
88A
88E
88I
8FE
8FH
8FI
8FJ
AAFWJ
AAJSJ
AAKDD
ABDBF
ABUWG
ACGFS
ACSMW
ADBBV
ADRAZ
AENEX
AFKRA
AJTQC
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AOIJS
AZQEC
BAWUL
BBNVY
BCNDV
BENPR
BHPHI
BPHCQ
BVXVI
C6C
CCPQU
DIK
DWQXO
EBD
EBLON
EBS
ESX
FYUFA
GNUQQ
GROUPED_DOAJ
GX1
HCIFZ
HH5
HMCUK
HYE
KQ8
LK8
M0L
M1P
M2P
M7P
M~E
NAO
OK1
PIMPY
PQQKQ
PROAC
PSQYO
RIG
RNT
RNTTT
RPM
SNYQT
UKHRP
NPM
AAYXX
AFPKN
CITATION
7XB
8FK
K9.
M48
PQEST
PQUKI
PRINS
Q9U
7X8
5PM
ID FETCH-LOGICAL-c492t-6ea964a914ef2a7aa93c86e5506c625ee13bc09c6becfc0c34e3f93093615f953
IEDL.DBID RPM
ISSN 2045-2322
IngestDate Thu Sep 05 15:43:32 EDT 2024
Tue Sep 17 21:29:01 EDT 2024
Tue Aug 27 04:52:30 EDT 2024
Thu Oct 10 22:56:11 EDT 2024
Fri Aug 23 03:15:21 EDT 2024
Sun Oct 13 09:42:22 EDT 2024
Fri Oct 11 20:56:28 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Deep learning
Loss function
SPT-depth
Transformer
Depth estimation
Stereoscopic display
Language English
License 2024. The Author(s).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c492t-6ea964a914ef2a7aa93c86e5506c625ee13bc09c6becfc0c34e3f93093615f953
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10963766/
PMID 38528098
PQID 2985401918
PQPubID 2041939
PageCount 1
ParticipantIDs doaj_primary_oai_doaj_org_article_7e8b6d9667784f6b81e6c40dcaa69449
pubmedcentral_primary_oai_pubmedcentral_nih_gov_10963766
proquest_miscellaneous_3014008651
proquest_journals_2985401918
crossref_primary_10_1038_s41598_024_57908_z
pubmed_primary_38528098
springer_journals_10_1038_s41598_024_57908_z
PublicationCentury 2000
PublicationDate 2024-03-25
PublicationDateYYYYMMDD 2024-03-25
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-03-25
  day: 25
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle Scientific reports
PublicationTitleAbbrev Sci Rep
PublicationTitleAlternate Sci Rep
PublicationYear 2024
Publisher Nature Publishing Group UK
Nature Publishing Group
Nature Portfolio
Publisher_xml – name: Nature Publishing Group UK
– name: Nature Publishing Group
– name: Nature Portfolio
References HorváthSKovalevDMishchenkoKRichtárikPStichSUStochastic distributed learning with gradient quantization and double-variance reductionOptim. Methods Softw.20223891106455287010.1080/10556788.2022.2117355
Charles LeekELeonardisAHeinkeDDeep neural networks and image classification in biological visionVis. Res.20221971080581:STN:280:DC%2BB2MrjvVansg%3D%3D10.1016/j.visres.2022.10805835487146
Li, Z., Liu, X., Creighton, F. X., Taylor, R. H. & Unberath, M. Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 6177–6186 (2020).
WangJSABV-depth: A biologically inspired deep learning network for monocular depth estimationKnowl. Based Syst.202326311030110.1016/j.knosys.2023.110301
Silberman, N., Hoiem, D., Kohli, P. & Fergus, R. Proc.European Conference on Computer Vision.
Huang, H. et al. UNet 3+: A full-scale connected UNet for medical image segmentation. Proc.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1055–1059 (2020).
ZhouYZhangJFangFDesign of the varifocal and multifocal optical near-eye see-through displayOptik20222701699422022Optik.270p9942Z10.1016/j.ijleo.2022.169942
Rao, Y., Zhao, W., Zhu, Z., Lu, J. & Zhou, J. Global filter networks for image classification. ArXivabs/2107.00645 (2021).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2015).
Eigen, D., Puhrsch, C. & Fergus, R. in Neural Information Processing Systems.
Zhang, N., Nex, F., Vosselman, G. & Kerle, N. Lite-mono: A lightweight CNN and transformer architecture for self-supervised monocular depth estimation. Proc.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18537–18546 (2022).
Loshchilov, I. & Hutter, F. Fixing weight decay regularization in Adam. ArXivabs/1711.05101 (2017).
Zhang, H. et al. ResNeSt: Split-attention networks. Proc.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2735–2745 (2020).
Andriluka, M. et al.Proc.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5167–5176.
Mousavian, A., Pirsiavash, H. & Kosecka, J. Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks. Proc.2016 Fourth International Conference on 3D Vision (3DV), 611–619 (2016).
Godard, C., Mac Aodha, O. & Brostow, G. J. Digging into self-supervised monocular depth estimation. ArXivabs/1806.01260 (2018).
Wang, W. et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 548–558 (2021).
Cao, Y., Luo, F. & Li, Y. in Image and Graphics: 12th International Conference, ICIG 2023, Nanjing, China, September 22–24, 2023, Proceedings, Part I 81–92 (Springer-Verlag, Nanjing, China, 2023).
LiRJiPXuYBhanuBMonoIndoor++: Towards better practice of self-supervised monocular depth estimation for indoor environmentsIEEE Trans. Circuits Syst. Video Technol.20233383084610.1109/TCSVT.2022.3207105
Peng, R., Wang, R., Lai, Y., Tang, L. & Cai, Y. Excavating the potential capacity of self-supervised monocular depth estimation. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 15540–15549 (2021).
Shelhamer, E., Long, J. & Darrell, T. Fully convolutional networks for semantic segmentation. Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440 (2014).
Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. ArXivabs/1505.04597 (2015).
RajamaniKTRaniPSiebertHElagiri RamalingamRHeinrichMPAttention-augmented U-Net (AA-U-Net) for semantic segmentationSignal Image Video Process.20231798198910.1007/s11760-022-02302-335910403
Zhou, H., Greenwood, D., Taylor, S. L. & Gong, H. Constant velocity constraints for self-supervised monocular depth estimation. Proc. of the 17th ACM SIGGRAPH European Conference on Visual Media Production (2020).
Miangoleh, S. M. H., Dille, S., Mai, L., Paris, S. & Aksoy, Y. Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9680–9689 (2021).
Brébisson, A. d. & Vincent, P. The Z-loss: A shift and scale invariant classification loss belonging to the Spherical Family. ArXivabs/1604.08859 (2016).
Liu, Z. et al. Swin Transformer: Hierarchical vision transformer using shifted windows. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992–10002 (2021).
ZhengQYuTWangFSelf-supervised monocular depth estimation based on combining convolution and multilayer perceptronEng. Appl. Artif. Intell.20231171055872023wbet.book.....Z10.1016/j.engappai.2022.105587
KrizhevskyASutskeverIHintonGEImageNet classification with deep convolutional neural networksCommun. ACM201260849010.1145/3065386
Yang, L. et al. Depth Anything: Unleashing the power of large-scale unlabeled data. ArXivabs/2401.10891 (2024).
Carion, N. et al. End-to-end object detection with transformers. ArXivabs/2005.12872 (2020).
BianJAuto-rectify network for unsupervised indoor depth estimationIEEE Trans. Pattern Anal. Mach. Intell.2020449802981310.1109/TPAMI.2021.3136220
Tolstikhin, I. O. et al. in Neural Information Processing Systems.
LiYClaesenLHuangKZhaoMA real-time high-quality complete system for depth image-based rendering on FPGAIEEE Trans. Circuits Syst. Video Technol.2019291179119310.1109/TCSVT.2018.2825022
Mayer, N. et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proc.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4040–4048 (2015).
Hu, J., Ozay, M., Zhang, Y. & Okatani, T. Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. Proc.2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 1043–1051 (2018).
ZhaoZYangHLuoHDefocus Blur detection via transformer encoder and edge guidanceAppl. Intell.202252144261443910.1007/s10489-022-03303-y
Vaswani, A. et al. in Neural Information Processing Systems.
Alahari, K., Seguin, G., Sivic, J. & Laptev, I. Proc.2013 IEEE International Conference on Computer Vision. 2112–2119.
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. ArXivabs/2010.11929 (2020).
WangWPVT v2: improved baselines with pyramid vision transformerComput. Vis. Med.2021841542410.1007/s41095-022-0274-8
Song, M., Lim, S. & Kim, W. Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals. IEEE Trans. Circuits Syst. Video Technol.31, 4381–4393 (2021).
Liu, M., Meng, F. & Liang, Y. Generalized pose decoupled network for unsupervised 3D skeleton sequence-based action representation learning.
ZhaoCDaiMXiongJ-YRegion-of-interest based rate control for UAV video codingOptoelectron. Lett.2016122162202016OptEL..12..216Z10.1007/s11801-016-6052-z
Lyu, X. et al. HR-depth: High resolution self-supervised monocular depth estimation. ArXivabs/2012.07356 (2020).
Noh, H., Hong, S. & Han, B. Learning deconvolution network for semantic segmentation. Proc. 2015 IEEE International Conference on Computer Vision (ICCV), 1520–1528 (2015).
Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 12159–12168 (2021).
ChenYZhaoHHuZAttention-based context aggregation network for monocular depth estimationInt. J. Mach. Learn. Cybern.2019121583159610.1007/s13042-020-01251-y
Shaw, P., Uszkoreit, J. & Vaswani, A. in North American Chapter of the Association for Computational Linguistics.
Bhat, S., Alhashim, I. & Wonka, P. AdaBins: Depth estimation using adaptive bins. Proc.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4008–4017 (2020).
57908_CR31
57908_CR35
57908_CR37
57908_CR38
57908_CR39
J Wang (57908_CR19) 2023; 263
Q Zheng (57908_CR30) 2023; 117
Z Zhao (57908_CR9) 2022; 52
57908_CR20
57908_CR21
W Wang (57908_CR34) 2021; 8
57908_CR22
57908_CR23
57908_CR24
57908_CR25
57908_CR26
57908_CR27
57908_CR28
57908_CR29
Y Chen (57908_CR47) 2019; 12
57908_CR50
S Horváth (57908_CR40) 2022; 38
57908_CR11
57908_CR12
E Charles Leek (57908_CR8) 2022; 197
57908_CR13
57908_CR14
R Li (57908_CR49) 2023; 33
57908_CR15
57908_CR16
57908_CR17
57908_CR18
J Bian (57908_CR44) 2020; 44
C Zhao (57908_CR10) 2016; 12
Y Li (57908_CR32) 2019; 29
57908_CR41
57908_CR42
57908_CR43
57908_CR7
57908_CR45
57908_CR46
57908_CR48
Y Zhou (57908_CR33) 2022; 270
KT Rajamani (57908_CR36) 2023; 17
57908_CR1
57908_CR2
57908_CR4
A Krizhevsky (57908_CR3) 2012; 60
57908_CR5
57908_CR6
References_xml – ident: 57908_CR16
  doi: 10.1109/ICCV48922.2021.00614
– ident: 57908_CR42
– volume: 117
  start-page: 105587
  year: 2023
  ident: 57908_CR30
  publication-title: Eng. Appl. Artif. Intell.
  doi: 10.1016/j.engappai.2022.105587
  contributor:
    fullname: Q Zheng
– ident: 57908_CR13
– ident: 57908_CR48
  doi: 10.1109/WACV.2019.00116
– volume: 197
  start-page: 108058
  year: 2022
  ident: 57908_CR8
  publication-title: Vis. Res.
  doi: 10.1016/j.visres.2022.108058
  contributor:
    fullname: E Charles Leek
– ident: 57908_CR2
  doi: 10.1145/3429341.3429355
– volume: 60
  start-page: 84
  year: 2012
  ident: 57908_CR3
  publication-title: Commun. ACM
  doi: 10.1145/3065386
  contributor:
    fullname: A Krizhevsky
– ident: 57908_CR1
  doi: 10.1109/CVPR46437.2021.00956
– ident: 57908_CR24
  doi: 10.1109/CVPR.2016.90
– ident: 57908_CR22
– volume: 29
  start-page: 1179
  year: 2019
  ident: 57908_CR32
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2018.2825022
  contributor:
    fullname: Y Li
– ident: 57908_CR20
  doi: 10.1109/CVPR52729.2023.01778
– ident: 57908_CR18
  doi: 10.1109/ICCV48922.2021.00986
– ident: 57908_CR41
– volume: 12
  start-page: 1583
  year: 2019
  ident: 57908_CR47
  publication-title: Int. J. Mach. Learn. Cybern.
  doi: 10.1007/s13042-020-01251-y
  contributor:
    fullname: Y Chen
– ident: 57908_CR12
– ident: 57908_CR37
  doi: 10.1109/3DV.2016.69
– ident: 57908_CR27
– volume: 8
  start-page: 415
  year: 2021
  ident: 57908_CR34
  publication-title: Comput. Vis. Med.
  doi: 10.1007/s41095-022-0274-8
  contributor:
    fullname: W Wang
– ident: 57908_CR6
  doi: 10.1007/978-3-319-24574-4_28
– ident: 57908_CR15
– ident: 57908_CR26
  doi: 10.1109/ICCV48922.2021.01527
– ident: 57908_CR7
  doi: 10.1109/ICCV.2019.00393
– volume: 12
  start-page: 216
  year: 2016
  ident: 57908_CR10
  publication-title: Optoelectron. Lett.
  doi: 10.1007/s11801-016-6052-z
  contributor:
    fullname: C Zhao
– ident: 57908_CR50
  doi: 10.1109/TCSVT.2021.3049869
– volume: 17
  start-page: 981
  year: 2023
  ident: 57908_CR36
  publication-title: Signal Image Video Process.
  doi: 10.1007/s11760-022-02302-3
  contributor:
    fullname: KT Rajamani
– volume: 263
  start-page: 110301
  year: 2023
  ident: 57908_CR19
  publication-title: Knowl. Based Syst.
  doi: 10.1016/j.knosys.2023.110301
  contributor:
    fullname: J Wang
– ident: 57908_CR28
– ident: 57908_CR31
– ident: 57908_CR38
  doi: 10.1007/978-3-031-46305-1_7
– ident: 57908_CR45
  doi: 10.1109/CVPRW56347.2022.00309
– ident: 57908_CR11
  doi: 10.1109/ICASSP40776.2020.9053405
– ident: 57908_CR43
– volume: 44
  start-page: 9802
  year: 2020
  ident: 57908_CR44
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2021.3136220
  contributor:
    fullname: J Bian
– ident: 57908_CR23
  doi: 10.1109/CVPR.2016.438
– ident: 57908_CR14
– ident: 57908_CR39
– ident: 57908_CR17
  doi: 10.1109/ICCV48922.2021.01196
– volume: 270
  start-page: 169942
  year: 2022
  ident: 57908_CR33
  publication-title: Optik
  doi: 10.1016/j.ijleo.2022.169942
  contributor:
    fullname: Y Zhou
– ident: 57908_CR29
  doi: 10.1109/ICCV48922.2021.00061
– ident: 57908_CR5
  doi: 10.1109/ICCV.2015.178
– ident: 57908_CR35
– ident: 57908_CR4
– ident: 57908_CR25
– volume: 38
  start-page: 91
  year: 2022
  ident: 57908_CR40
  publication-title: Optim. Methods Softw.
  doi: 10.1080/10556788.2022.2117355
  contributor:
    fullname: S Horváth
– ident: 57908_CR21
– volume: 52
  start-page: 14426
  year: 2022
  ident: 57908_CR9
  publication-title: Appl. Intell.
  doi: 10.1007/s10489-022-03303-y
  contributor:
    fullname: Z Zhao
– ident: 57908_CR46
– volume: 33
  start-page: 830
  year: 2023
  ident: 57908_CR49
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2022.3207105
  contributor:
    fullname: R Li
SSID ssj0000529419
Score 2.4698725
Snippet Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is...
Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth...
Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth...
SourceID doaj
pubmedcentral
proquest
crossref
pubmed
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 7037
SubjectTerms 639/166
639/705
639/705/1042
639/705/117
Deep learning
Depth estimation
Depth perception
Humanities and Social Sciences
Loss function
Mean square errors
multidisciplinary
Noise reduction
Receptive field
Science
Science (multidisciplinary)
SPT-depth
Stereoscopic display
Training
Transformer
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1La9wwEBYlUOil9F23aVGht1bEelo69hVCoT01kJuQpTHZQ73LrveQ_PqOJO822we9FF-MZZlhZuT5htF8IuR1MJZDHzXD6JIYRqiBWZUUM5GrpHkXTemQ-_LVnJ2rzxf64sZRX3lPWKUHroo76cD2JiEo7zqrBtPjp01UbYohGKdUbd3j-kYyVVm9hVPczV0yrbQnG4xUuZtMKKY711p2fRCJCmH_n1Dm75slf6mYlkB0eo_cnREkfVclv09uwfiA3K5nSl49JNNHTEyBonctyxZTmmA1XdJMplG7FCnCVJrpEWCZW1IWkdb-cpoDWqJ4s7pah--LRKcdqIU1DWOiZfMh26BVgQ5QGEHpsM1zH5Hz00_fPpyx-WgFFpUTEzMQnFHBcQWDCF0ITkZrANMVEzEjAuCyj62LBk08xDZKBXJwuWqKCGhwWj4mR-NyhKeEIqbplQvcWMBv64CAEy8Qg3BJt9I15M1OzX5VGTR8qXxL66tRPBrFF6P464a8z5bYv5nZr8sD9Ak_-4T_l0805HhnRz8vyY0XziI6RdlsQ17th3Ex5QpJGGG53fiSb2KSp3lDnlSz7yWRVgvbOpxtDxziQNTDkXFxWQi7OeaJ-CM3DXm7852fcv1dF8_-hy6ekzsiO30rmdDH5Ghab-EF4qipf1mWzA885BsO
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: AUTh Library subscriptions: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagFRIXxJvQgozEDazGz9gnRKFVhUSFEJV6sxzboXtosmyyh_bXd-wkWy0v5RLFeTj-xp5vPJ4xQm-d0jTWXhLQLoGAhmqIFkEQ5akIklZe5Qi5r6fq5Ex8OZfn04RbPy2rnMfEPFCHzqc58gNmNJALsC70h-UvknaNSt7VaQuNu2iXUZHctLuHR6ffvm9mWZIfS1AzRcuUXB_0oLFSVBkTRFam1OR6SyPlxP1_Y5t_Lpr8zXOaFdLxQ_RgYpL44wj9I3Qnto_RvXFvyasnaPgMBmrEUO0uLzXFIS6HC5ySaozRihjoKk5pEmKXQlMWHo9x5jgptoDhZHm1cpeLgIeZ3MYVdm3AeREi6QHdiJuYM4PiZp2efYrOjo9-fDoh0xYLxAvDBqKiM0o4Q0VsmKucM9xrFcFsUR4soxgpr31pvAKoG196LiJvTPKeAhNqjOTP0E7btfEFwsBtamEcVTrCu6UD4glHZA0zQZbcFOjd3Mx2OWbSsNkDzrUdQbEAis2g2OsCHSYkNnemLNj5Qrf6aadOZauoaxXAYKsqLRpVg9gpL8rgnVNGCPjk_oyjnbpmb28FqUBvNsXQqZKnxLWxW_c2251g7ElaoOcj7JuacC2ZLg08rbcEYquq2yXt4iIn7qZgL8KArgr0fpad23r9uy1e_v839tB9lsS55ITJfbQzrNbxFTCloX49dYcbBpwUBA
  priority: 102
  providerName: ProQuest
– databaseName: Springer_OA刊
  dbid: C6C
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELagCIkLKu_QgozEDSziZ-wjLFQVEpyo1Jvl2BN1D2RXu9lD--sZO9lFgXJAuURxnIz8jTPfZDxjQt4GYzm0UTO0LomhheqYVUkxE7lKmjfRlAy5b9_N-YX6eqkvpzI5ORdmFr-X9sMWDUxOAhOK6cbVlt3cJfc0N3XW4IVZHP6n5IiV4m7Ki7m968z2lBL9t_HKv5dH_hEjLabn7Jg8nDgj_TiC_Ijcgf4xuT_uInn9hAyf0RUFivq0KotKaYL1cEVz-YwxL5EiMaW5IAKschLKMtIxo5xmE5YonqyvN-HnMtFhT2NhQ0OfaFluyLaII9AOSg1Q2u1y36fk4uzLj8U5mzZTYFE5MTADwRkVHFfQidCE4GS0BtBBMRF9IAAu21i7aBDULtZRKpCdy3FS5Dyd0_IZOepXPbwgFFlMq1zgxgI-WwekmHiA6IRLupauIu_2w-zXY80MX2Ld0voRFI-g-AKKv6nIp4zE4c5c77pcQDXw0_TxDdjWJHTNmsaqzrSoYCaqOsUQjFMKX3m6x9FPk3DrhbPIR1E2W5E3h2acPjkmEnpY7ba-eJjo1mlekecj7AdJpNXC1g5725lCzESdt_TLq1Kim6NniJ9uU5H3e935Lde_x-Ll_91-Qh6IrN61ZEKfkqNhs4NXyJGG9nWZHL8Afq0MJQ
  priority: 102
  providerName: Springer Nature
Title Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion
URI https://link.springer.com/article/10.1038/s41598-024-57908-z
https://www.ncbi.nlm.nih.gov/pubmed/38528098
https://www.proquest.com/docview/2985401918
https://search.proquest.com/docview/3014008651
https://pubmed.ncbi.nlm.nih.gov/PMC10963766
https://doaj.org/article/7e8b6d9667784f6b81e6c40dcaa69449
Volume 14
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9wwEBZJSqGX0nfdpIsKvbXK2npZOjbbhFBICKWBvQlZkpuFrr3seg_Jr89IXm-7fVyKL7blx-BvhvnGmhkh9N5KVYTKCQLexRPwUDVR3HMiXcG9KEonU4XcxaU8v-ZfpmK6h-RQC5OS9l01O25-zI-b2U3KrVzM3XjIExtfXUwK4N1gGHK8j_ZLxn6J0fuO3lTzQm8qZHKmxivwUrGSjHIiSp0rcrfjhVKz_r8xzD8TJX-bLU1O6OwJerxhj_hTL-VTtBeaZ-hhv57k7XPUfYagNGDQrDall2IfFt0Njo00-gpFDBQVx9YIoY3lKDOH-9pyHJ2Zx7CzuF3a-czjbiC0YYlt43FKPCQrQDTgOqRuoLhex3tfoOuz02-Tc7JZVoE4rmlHZLBacqsLHmpqS2s1c0oGCFWkg2gohIJVLtdOAry1yx3jgdU6zpgC-6m1YC_RQdM24TXCwGcqrm0hVYBnCwtkE7ZAa6q9yJnO0IfhM5tF3z3DpFlvpkwPigFQTALF3GXoJCKxvTJ2vk4n2uV3s8HflEFV0kOQVpaK17ICVZOO595ZKzXn8MqjAUezMceVoVoBMwXZVIbebYfBkOLsiG1Cu16ZFGtCgCeKDL3qYd9KwpSgKtdwt9pRiB1Rd0dAd1Oz7kFXM_Rx0J2fcv37W7z5_zcdokc0qnrOCBVH6KBbrsNbYE5dNQJzmZYj9ODk9PLqKxxN5GSU_kKMkgndA-b1Hp4
link.rule.ids 230,315,733,786,790,870,891,2115,12083,21416,27955,27956,31752,31753,33777,33778,41153,42222,43343,43838,51609,53825,53827,74100,74657
linkProvider National Library of Medicine
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELZgKwQXxLMEChiJG1hNYsexT4hCqwXaFUKt1Jvl2BO6B5Jlkz20v56xk91qeSmXKM7D8Tf2fOPxjAl5baXKoHIFQ-3iGWqominhBZMuE77ISidjhNzJTE7PxOfz4nyccOvGZZXrMTEO1L51YY58P9cKyQVaF-rd4icLu0YF7-q4hcZNshNSbqoJ2Tk4nH39tpllCX4skekxWiblar9DjRWiynLBilKnil1taaSYuP9vbPPPRZO_eU6jQjq6R-6OTJK-H6C_T25A84DcGvaWvHxI-o9ooALFardxqSn1sOgvaEiqMUQrUqSrNKRJgDaEpswdHeLMaVBsnuLJ4nJpf8w97dfkFpbUNp7GRYisQ3SB1hAzg9J6FZ59RM6ODk8_TNm4xQJzQuc9k2C1FFZnAurcltZq7pQENFukQ8sIIOOVS7WTCHXtUscF8FoH7ykyoVoX_DGZNG0DTwhFblMJbTOpAN9dWCSeeEBe59oXKdcJebNuZrMYMmmY6AHnygygGATFRFDMVUIOAhKbO0MW7HihXX43Y6cyJahKejTYylKJWlYodtKJ1DtrpRYCP7m3xtGMXbMz14KUkFebYuxUwVNiG2hXnYl2Jxp7RZaQ3QH2TU24KnKVanxabQnEVlW3S5r5RUzcnaG9iAO6TMjbtexc1-vfbfH0_7_xktyenp4cm-NPsy_PyJ08iHbKWV7skUm_XMFzZE199WLsGr8AO9UW-g
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Jb9QwFLagCMQFsZNSwEjcwJokXmKfEFBGZas4UGluluOlnQPJdJI5tL-eZyeZatiUSxRncfLe8_te3obQKyNk4WvLCWgXR0BDBSKZY0TYgjleVFakDLlvx-LohH1e8MUY_9SNYZXTmpgWatfa-I98VioJ4AKsCzkLY1jE98P529U5iR2koqd1bKdxHd0ALZnHNg7Votr-b4keLVaoMW8mp3LWge6K-WUlI7xSuSSXO7oplfD_G-78M3zyNx9qUk3zu-jOiCnxu4EJ7qFrvrmPbg5dJi8eoP4QTFWPYdJtCjrFzq_6MxzLawx5ixiAK44FE3wbk1SWFg8Z5ziqOIdhZ3WxNj-XDvcTzPVrbBqHUzgi6YDOHgefaoTisInXPkQn848_PhyRsdkCsUyVPRHeKMGMKpgPpamMUdRK4cGAERZsJO8LWttcWQFEDza3lHkaVPSjAiYKitNHaK9pG_8EYUA5NVOmENLDvbkBCAqbL0OpHM-pytDr6TPr1VBTQydfOJV6IIoGouhEFH2ZofeREtszYz3sdKBdn-pRvHTlZS0cmG5VJVkQNTCgsCx31hihGINHHkx01KOQdvqKpTL0cjsM4hV9Jqbx7abTyQIFs48XGXo8kH07Eyp5KXMFV8sdhtiZ6u5IszxLJbwLsBxhaRcZejPxztW8_v0t9v__Gi_QLZAJ_fXT8Zen6HYZOTunpOQHaK9fb_wzgE99_TzJxS9irRm3
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dense+monocular+depth+estimation+for+stereoscopic+vision+based+on+pyramid+transformer+and+multi-scale+feature+fusion&rft.jtitle=Scientific+reports&rft.au=Xia%2C+Zhongyi&rft.au=Wu%2C+Tianzhao&rft.au=Wang%2C+Zhuoyan&rft.au=Zhou%2C+Man&rft.date=2024-03-25&rft.pub=Nature+Publishing+Group+UK&rft.eissn=2045-2322&rft.volume=14&rft_id=info:doi/10.1038%2Fs41598-024-57908-z&rft.externalDBID=PMC10963766
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2045-2322&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2045-2322&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2045-2322&client=summon