Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion
Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers t...
Saved in:
Published in | Scientific reports Vol. 14; no. 1; p. 7037 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
Nature Publishing Group UK
25.03.2024
Nature Publishing Group Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE. |
---|---|
AbstractList | Abstract
Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE. Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE. Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is crucial for achieving high-quality and realistic stereoscopic display effects. In addressing the inherent challenges of applying Transformers to depth estimation, the Stereoscopic Pyramid Transformer-Depth (SPT-Depth) is introduced. This method utilizes stepwise downsampling to acquire both shallow and deep semantic information, which are subsequently fused. The training process is divided into fine and coarse convergence stages, employing distinct training strategies and hyperparameters, resulting in a substantial reduction in both training and validation losses. In the training strategy, a shift and scale-invariant mean square error function is employed to compensate for the lack of translational invariance in the Transformers. Additionally, an edge-smoothing function is applied to reduce noise in the depth map, enhancing the model's robustness. The SPT-Depth achieves a global receptive field while effectively reducing time complexity. In comparison with the baseline method, with the New York University Depth V2 (NYU Depth V2) dataset, there is a 10% reduction in Absolute Relative Error (Abs Rel) and a 36% decrease in Root Mean Square Error (RMSE). When compared with the state-of-the-art methods, there is a 17% reduction in RMSE. |
ArticleNumber | 7037 |
Author | Wu, Tianzhao Wu, Boqi Xia, Zhongyi Chan, C. Y. Wang, Zhuoyan Kong, Ling Bing Zhou, Man |
Author_xml | – sequence: 1 givenname: Zhongyi surname: Xia fullname: Xia, Zhongyi organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University – sequence: 2 givenname: Tianzhao surname: Wu fullname: Wu, Tianzhao organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University – sequence: 3 givenname: Zhuoyan surname: Wang fullname: Wang, Zhuoyan organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University – sequence: 4 givenname: Man surname: Zhou fullname: Zhou, Man organization: College of New Materials and New Energies, Shenzhen Technology University, College of Applied Technology, Shenzhen University – sequence: 5 givenname: Boqi surname: Wu fullname: Wu, Boqi organization: Jilin Jianzhu University – sequence: 6 givenname: C. Y. surname: Chan fullname: Chan, C. Y. email: chenzengyuan@sztu.edu.cn organization: College of New Materials and New Energies, Shenzhen Technology University – sequence: 7 givenname: Ling Bing surname: Kong fullname: Kong, Ling Bing email: konglingbing@sztu.edu.cn organization: College of New Materials and New Energies, Shenzhen Technology University |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/38528098$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kstu1TAQhi1URC_0BVggS2zYBHyPvUKoQKlUiQ2sLceZnPoosQ92Uql9-vo0pbQs8GasmX8-e0b_MTqIKQJCbyj5QAnXH4ug0uiGMNHI1hDd3L5AR4wI2TDO2MGT-yE6LWVL6pHMCGpeoUOuJdPE6CM0f4FYAE8pJr-MLuMedvMVhjKHyc0hRTykjMsMGVLxaRc8vg5ln-9cgR7Xy-4muyn0eM4ulqqeIGMXezwt4xya4t0IeAA3L7nGZd_7Gr0c3Fjg9CGeoF_fvv48-95c_ji_OPt82Xhh2NwocEYJZ6iAgbnWOcO9ViAlUV4xCUB554nxqgM_eOK5AD4YTgxXVA5G8hN0sXL75LZ2l-tI-cYmF-x9IuWNdXkOfgTbgu5Ub5RqWy0G1WkKygvSe-eUEcJU1qeVtVu6CXoPsc47PoM-r8RwZTfp2lJiFG-VqoT3D4Scfi91w3YKxcM4ughpKZYTKgjRStIqffePdJuWHOuuLDNaCkIN1VXFVpXPqZQMw-NvKLF7k9jVJLaaxN6bxN7WprdP53hs-WOJKuCroNRS3ED--_Z_sHcEesyM |
Cites_doi | 10.1109/ICCV48922.2021.00614 10.1016/j.engappai.2022.105587 10.1109/WACV.2019.00116 10.1016/j.visres.2022.108058 10.1145/3429341.3429355 10.1145/3065386 10.1109/CVPR46437.2021.00956 10.1109/CVPR.2016.90 10.1109/TCSVT.2018.2825022 10.1109/CVPR52729.2023.01778 10.1109/ICCV48922.2021.00986 10.1007/s13042-020-01251-y 10.1109/3DV.2016.69 10.1007/s41095-022-0274-8 10.1007/978-3-319-24574-4_28 10.1109/ICCV48922.2021.01527 10.1109/ICCV.2019.00393 10.1007/s11801-016-6052-z 10.1109/TCSVT.2021.3049869 10.1007/s11760-022-02302-3 10.1016/j.knosys.2023.110301 10.1007/978-3-031-46305-1_7 10.1109/CVPRW56347.2022.00309 10.1109/ICASSP40776.2020.9053405 10.1109/TPAMI.2021.3136220 10.1109/CVPR.2016.438 10.1109/ICCV48922.2021.01196 10.1016/j.ijleo.2022.169942 10.1109/ICCV48922.2021.00061 10.1109/ICCV.2015.178 10.1080/10556788.2022.2117355 10.1007/s10489-022-03303-y 10.1109/TCSVT.2022.3207105 |
ContentType | Journal Article |
Copyright | The Author(s) 2024 2024. The Author(s). The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: The Author(s) 2024 – notice: 2024. The Author(s). – notice: The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | C6C NPM AAYXX CITATION 3V. 7X7 7XB 88A 88E 88I 8FE 8FH 8FI 8FJ 8FK ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO FYUFA GHDGH GNUQQ HCIFZ K9. LK8 M0S M1P M2P M7P PIMPY PQEST PQQKQ PQUKI PRINS Q9U 7X8 5PM DOA |
DOI | 10.1038/s41598-024-57908-z |
DatabaseName | Springer_OA刊 PubMed CrossRef ProQuest Central (Corporate) Proquest Health & Medical Complete ProQuest Central (purchase pre-March 2016) Biology Database (Alumni Edition) Medical Database (Alumni Edition) Science Database (Alumni Edition) ProQuest SciTech Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni Edition) ProQuest Central ProQuest Central Essentials Biological Science Collection AUTh Library subscriptions: ProQuest Central ProQuest Natural Science Collection ProQuest One Community College ProQuest Central Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Health & Medical Complete (Alumni) Biological Sciences Health & Medical Collection (Alumni Edition) PML(ProQuest Medical Library) ProQuest Science Journals Biological Science Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | PubMed CrossRef Publicly Available Content Database ProQuest Central Student ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection ProQuest Central China ProQuest Biology Journals (Alumni Edition) ProQuest Central Health Research Premium Collection Health and Medicine Complete (Alumni Edition) Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Medical Library (Alumni) ProQuest Science Journals (Alumni Edition) ProQuest Biological Science Collection ProQuest Central Basic ProQuest Science Journals ProQuest One Academic Eastern Edition ProQuest Hospital Collection Health Research Premium Collection (Alumni) Biological Science Database ProQuest SciTech Collection ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest Medical Library ProQuest One Academic UKI Edition ProQuest One Academic ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | CrossRef PubMed Publicly Available Content Database MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: C6C name: Springer_OA刊 url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: BENPR name: AUTh Library subscriptions: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 2045-2322 |
EndPage | 7037 |
ExternalDocumentID | oai_doaj_org_article_7e8b6d9667784f6b81e6c40dcaa69449 10_1038_s41598_024_57908_z 38528098 |
Genre | Journal Article |
GroupedDBID | 0R~ 3V. 4.4 53G 5VS 7X7 88A 88E 88I 8FE 8FH 8FI 8FJ AAFWJ AAJSJ AAKDD ABDBF ABUWG ACGFS ACSMW ADBBV ADRAZ AENEX AFKRA AJTQC ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS AZQEC BAWUL BBNVY BCNDV BENPR BHPHI BPHCQ BVXVI C6C CCPQU DIK DWQXO EBD EBLON EBS ESX FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HH5 HMCUK HYE KQ8 LK8 M0L M1P M2P M7P M~E NAO OK1 PIMPY PQQKQ PROAC PSQYO RIG RNT RNTTT RPM SNYQT UKHRP NPM AAYXX AFPKN CITATION 7XB 8FK K9. M48 PQEST PQUKI PRINS Q9U 7X8 5PM |
ID | FETCH-LOGICAL-c492t-6ea964a914ef2a7aa93c86e5506c625ee13bc09c6becfc0c34e3f93093615f953 |
IEDL.DBID | RPM |
ISSN | 2045-2322 |
IngestDate | Thu Sep 05 15:43:32 EDT 2024 Tue Sep 17 21:29:01 EDT 2024 Tue Aug 27 04:52:30 EDT 2024 Thu Oct 10 22:56:11 EDT 2024 Fri Aug 23 03:15:21 EDT 2024 Sun Oct 13 09:42:22 EDT 2024 Fri Oct 11 20:56:28 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Deep learning Loss function SPT-depth Transformer Depth estimation Stereoscopic display |
Language | English |
License | 2024. The Author(s). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c492t-6ea964a914ef2a7aa93c86e5506c625ee13bc09c6becfc0c34e3f93093615f953 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10963766/ |
PMID | 38528098 |
PQID | 2985401918 |
PQPubID | 2041939 |
PageCount | 1 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_7e8b6d9667784f6b81e6c40dcaa69449 pubmedcentral_primary_oai_pubmedcentral_nih_gov_10963766 proquest_miscellaneous_3014008651 proquest_journals_2985401918 crossref_primary_10_1038_s41598_024_57908_z pubmed_primary_38528098 springer_journals_10_1038_s41598_024_57908_z |
PublicationCentury | 2000 |
PublicationDate | 2024-03-25 |
PublicationDateYYYYMMDD | 2024-03-25 |
PublicationDate_xml | – month: 03 year: 2024 text: 2024-03-25 day: 25 |
PublicationDecade | 2020 |
PublicationPlace | London |
PublicationPlace_xml | – name: London – name: England |
PublicationTitle | Scientific reports |
PublicationTitleAbbrev | Sci Rep |
PublicationTitleAlternate | Sci Rep |
PublicationYear | 2024 |
Publisher | Nature Publishing Group UK Nature Publishing Group Nature Portfolio |
Publisher_xml | – name: Nature Publishing Group UK – name: Nature Publishing Group – name: Nature Portfolio |
References | HorváthSKovalevDMishchenkoKRichtárikPStichSUStochastic distributed learning with gradient quantization and double-variance reductionOptim. Methods Softw.20223891106455287010.1080/10556788.2022.2117355 Charles LeekELeonardisAHeinkeDDeep neural networks and image classification in biological visionVis. Res.20221971080581:STN:280:DC%2BB2MrjvVansg%3D%3D10.1016/j.visres.2022.10805835487146 Li, Z., Liu, X., Creighton, F. X., Taylor, R. H. & Unberath, M. Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 6177–6186 (2020). WangJSABV-depth: A biologically inspired deep learning network for monocular depth estimationKnowl. Based Syst.202326311030110.1016/j.knosys.2023.110301 Silberman, N., Hoiem, D., Kohli, P. & Fergus, R. Proc.European Conference on Computer Vision. Huang, H. et al. UNet 3+: A full-scale connected UNet for medical image segmentation. Proc.ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1055–1059 (2020). ZhouYZhangJFangFDesign of the varifocal and multifocal optical near-eye see-through displayOptik20222701699422022Optik.270p9942Z10.1016/j.ijleo.2022.169942 Rao, Y., Zhao, W., Zhu, Z., Lu, J. & Zhou, J. Global filter networks for image classification. ArXivabs/2107.00645 (2021). He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. Proc.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2015). Eigen, D., Puhrsch, C. & Fergus, R. in Neural Information Processing Systems. Zhang, N., Nex, F., Vosselman, G. & Kerle, N. Lite-mono: A lightweight CNN and transformer architecture for self-supervised monocular depth estimation. Proc.2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18537–18546 (2022). Loshchilov, I. & Hutter, F. Fixing weight decay regularization in Adam. ArXivabs/1711.05101 (2017). Zhang, H. et al. ResNeSt: Split-attention networks. Proc.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2735–2745 (2020). Andriluka, M. et al.Proc.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5167–5176. Mousavian, A., Pirsiavash, H. & Kosecka, J. Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks. Proc.2016 Fourth International Conference on 3D Vision (3DV), 611–619 (2016). Godard, C., Mac Aodha, O. & Brostow, G. J. Digging into self-supervised monocular depth estimation. ArXivabs/1806.01260 (2018). Wang, W. et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 548–558 (2021). Cao, Y., Luo, F. & Li, Y. in Image and Graphics: 12th International Conference, ICIG 2023, Nanjing, China, September 22–24, 2023, Proceedings, Part I 81–92 (Springer-Verlag, Nanjing, China, 2023). LiRJiPXuYBhanuBMonoIndoor++: Towards better practice of self-supervised monocular depth estimation for indoor environmentsIEEE Trans. Circuits Syst. Video Technol.20233383084610.1109/TCSVT.2022.3207105 Peng, R., Wang, R., Lai, Y., Tang, L. & Cai, Y. Excavating the potential capacity of self-supervised monocular depth estimation. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 15540–15549 (2021). Shelhamer, E., Long, J. & Darrell, T. Fully convolutional networks for semantic segmentation. Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440 (2014). Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. ArXivabs/1505.04597 (2015). RajamaniKTRaniPSiebertHElagiri RamalingamRHeinrichMPAttention-augmented U-Net (AA-U-Net) for semantic segmentationSignal Image Video Process.20231798198910.1007/s11760-022-02302-335910403 Zhou, H., Greenwood, D., Taylor, S. L. & Gong, H. Constant velocity constraints for self-supervised monocular depth estimation. Proc. of the 17th ACM SIGGRAPH European Conference on Visual Media Production (2020). Miangoleh, S. M. H., Dille, S., Mai, L., Paris, S. & Aksoy, Y. Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9680–9689 (2021). Brébisson, A. d. & Vincent, P. The Z-loss: A shift and scale invariant classification loss belonging to the Spherical Family. ArXivabs/1604.08859 (2016). Liu, Z. et al. Swin Transformer: Hierarchical vision transformer using shifted windows. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9992–10002 (2021). ZhengQYuTWangFSelf-supervised monocular depth estimation based on combining convolution and multilayer perceptronEng. Appl. Artif. Intell.20231171055872023wbet.book.....Z10.1016/j.engappai.2022.105587 KrizhevskyASutskeverIHintonGEImageNet classification with deep convolutional neural networksCommun. ACM201260849010.1145/3065386 Yang, L. et al. Depth Anything: Unleashing the power of large-scale unlabeled data. ArXivabs/2401.10891 (2024). Carion, N. et al. End-to-end object detection with transformers. ArXivabs/2005.12872 (2020). BianJAuto-rectify network for unsupervised indoor depth estimationIEEE Trans. Pattern Anal. Mach. Intell.2020449802981310.1109/TPAMI.2021.3136220 Tolstikhin, I. O. et al. in Neural Information Processing Systems. LiYClaesenLHuangKZhaoMA real-time high-quality complete system for depth image-based rendering on FPGAIEEE Trans. Circuits Syst. Video Technol.2019291179119310.1109/TCSVT.2018.2825022 Mayer, N. et al. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proc.2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4040–4048 (2015). Hu, J., Ozay, M., Zhang, Y. & Okatani, T. Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries. Proc.2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 1043–1051 (2018). ZhaoZYangHLuoHDefocus Blur detection via transformer encoder and edge guidanceAppl. Intell.202252144261443910.1007/s10489-022-03303-y Vaswani, A. et al. in Neural Information Processing Systems. Alahari, K., Seguin, G., Sivic, J. & Laptev, I. Proc.2013 IEEE International Conference on Computer Vision. 2112–2119. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. ArXivabs/2010.11929 (2020). WangWPVT v2: improved baselines with pyramid vision transformerComput. Vis. Med.2021841542410.1007/s41095-022-0274-8 Song, M., Lim, S. & Kim, W. Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals. IEEE Trans. Circuits Syst. Video Technol.31, 4381–4393 (2021). Liu, M., Meng, F. & Liang, Y. Generalized pose decoupled network for unsupervised 3D skeleton sequence-based action representation learning. ZhaoCDaiMXiongJ-YRegion-of-interest based rate control for UAV video codingOptoelectron. Lett.2016122162202016OptEL..12..216Z10.1007/s11801-016-6052-z Lyu, X. et al. HR-depth: High resolution self-supervised monocular depth estimation. ArXivabs/2012.07356 (2020). Noh, H., Hong, S. & Han, B. Learning deconvolution network for semantic segmentation. Proc. 2015 IEEE International Conference on Computer Vision (ICCV), 1520–1528 (2015). Ranftl, R., Bochkovskiy, A. & Koltun, V. Vision transformers for dense prediction. Proc.2021 IEEE/CVF International Conference on Computer Vision (ICCV), 12159–12168 (2021). ChenYZhaoHHuZAttention-based context aggregation network for monocular depth estimationInt. J. Mach. Learn. Cybern.2019121583159610.1007/s13042-020-01251-y Shaw, P., Uszkoreit, J. & Vaswani, A. in North American Chapter of the Association for Computational Linguistics. Bhat, S., Alhashim, I. & Wonka, P. AdaBins: Depth estimation using adaptive bins. Proc.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4008–4017 (2020). 57908_CR31 57908_CR35 57908_CR37 57908_CR38 57908_CR39 J Wang (57908_CR19) 2023; 263 Q Zheng (57908_CR30) 2023; 117 Z Zhao (57908_CR9) 2022; 52 57908_CR20 57908_CR21 W Wang (57908_CR34) 2021; 8 57908_CR22 57908_CR23 57908_CR24 57908_CR25 57908_CR26 57908_CR27 57908_CR28 57908_CR29 Y Chen (57908_CR47) 2019; 12 57908_CR50 S Horváth (57908_CR40) 2022; 38 57908_CR11 57908_CR12 E Charles Leek (57908_CR8) 2022; 197 57908_CR13 57908_CR14 R Li (57908_CR49) 2023; 33 57908_CR15 57908_CR16 57908_CR17 57908_CR18 J Bian (57908_CR44) 2020; 44 C Zhao (57908_CR10) 2016; 12 Y Li (57908_CR32) 2019; 29 57908_CR41 57908_CR42 57908_CR43 57908_CR7 57908_CR45 57908_CR46 57908_CR48 Y Zhou (57908_CR33) 2022; 270 KT Rajamani (57908_CR36) 2023; 17 57908_CR1 57908_CR2 57908_CR4 A Krizhevsky (57908_CR3) 2012; 60 57908_CR5 57908_CR6 |
References_xml | – ident: 57908_CR16 doi: 10.1109/ICCV48922.2021.00614 – ident: 57908_CR42 – volume: 117 start-page: 105587 year: 2023 ident: 57908_CR30 publication-title: Eng. Appl. Artif. Intell. doi: 10.1016/j.engappai.2022.105587 contributor: fullname: Q Zheng – ident: 57908_CR13 – ident: 57908_CR48 doi: 10.1109/WACV.2019.00116 – volume: 197 start-page: 108058 year: 2022 ident: 57908_CR8 publication-title: Vis. Res. doi: 10.1016/j.visres.2022.108058 contributor: fullname: E Charles Leek – ident: 57908_CR2 doi: 10.1145/3429341.3429355 – volume: 60 start-page: 84 year: 2012 ident: 57908_CR3 publication-title: Commun. ACM doi: 10.1145/3065386 contributor: fullname: A Krizhevsky – ident: 57908_CR1 doi: 10.1109/CVPR46437.2021.00956 – ident: 57908_CR24 doi: 10.1109/CVPR.2016.90 – ident: 57908_CR22 – volume: 29 start-page: 1179 year: 2019 ident: 57908_CR32 publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2018.2825022 contributor: fullname: Y Li – ident: 57908_CR20 doi: 10.1109/CVPR52729.2023.01778 – ident: 57908_CR18 doi: 10.1109/ICCV48922.2021.00986 – ident: 57908_CR41 – volume: 12 start-page: 1583 year: 2019 ident: 57908_CR47 publication-title: Int. J. Mach. Learn. Cybern. doi: 10.1007/s13042-020-01251-y contributor: fullname: Y Chen – ident: 57908_CR12 – ident: 57908_CR37 doi: 10.1109/3DV.2016.69 – ident: 57908_CR27 – volume: 8 start-page: 415 year: 2021 ident: 57908_CR34 publication-title: Comput. Vis. Med. doi: 10.1007/s41095-022-0274-8 contributor: fullname: W Wang – ident: 57908_CR6 doi: 10.1007/978-3-319-24574-4_28 – ident: 57908_CR15 – ident: 57908_CR26 doi: 10.1109/ICCV48922.2021.01527 – ident: 57908_CR7 doi: 10.1109/ICCV.2019.00393 – volume: 12 start-page: 216 year: 2016 ident: 57908_CR10 publication-title: Optoelectron. Lett. doi: 10.1007/s11801-016-6052-z contributor: fullname: C Zhao – ident: 57908_CR50 doi: 10.1109/TCSVT.2021.3049869 – volume: 17 start-page: 981 year: 2023 ident: 57908_CR36 publication-title: Signal Image Video Process. doi: 10.1007/s11760-022-02302-3 contributor: fullname: KT Rajamani – volume: 263 start-page: 110301 year: 2023 ident: 57908_CR19 publication-title: Knowl. Based Syst. doi: 10.1016/j.knosys.2023.110301 contributor: fullname: J Wang – ident: 57908_CR28 – ident: 57908_CR31 – ident: 57908_CR38 doi: 10.1007/978-3-031-46305-1_7 – ident: 57908_CR45 doi: 10.1109/CVPRW56347.2022.00309 – ident: 57908_CR11 doi: 10.1109/ICASSP40776.2020.9053405 – ident: 57908_CR43 – volume: 44 start-page: 9802 year: 2020 ident: 57908_CR44 publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2021.3136220 contributor: fullname: J Bian – ident: 57908_CR23 doi: 10.1109/CVPR.2016.438 – ident: 57908_CR14 – ident: 57908_CR39 – ident: 57908_CR17 doi: 10.1109/ICCV48922.2021.01196 – volume: 270 start-page: 169942 year: 2022 ident: 57908_CR33 publication-title: Optik doi: 10.1016/j.ijleo.2022.169942 contributor: fullname: Y Zhou – ident: 57908_CR29 doi: 10.1109/ICCV48922.2021.00061 – ident: 57908_CR5 doi: 10.1109/ICCV.2015.178 – ident: 57908_CR35 – ident: 57908_CR4 – ident: 57908_CR25 – volume: 38 start-page: 91 year: 2022 ident: 57908_CR40 publication-title: Optim. Methods Softw. doi: 10.1080/10556788.2022.2117355 contributor: fullname: S Horváth – ident: 57908_CR21 – volume: 52 start-page: 14426 year: 2022 ident: 57908_CR9 publication-title: Appl. Intell. doi: 10.1007/s10489-022-03303-y contributor: fullname: Z Zhao – ident: 57908_CR46 – volume: 33 start-page: 830 year: 2023 ident: 57908_CR49 publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2022.3207105 contributor: fullname: R Li |
SSID | ssj0000529419 |
Score | 2.4698725 |
Snippet | Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth estimation is... Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth... Abstract Stereoscopic display technology plays a significant role in industries, such as film, television and autonomous driving. The accuracy of depth... |
SourceID | doaj pubmedcentral proquest crossref pubmed springer |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 7037 |
SubjectTerms | 639/166 639/705 639/705/1042 639/705/117 Deep learning Depth estimation Depth perception Humanities and Social Sciences Loss function Mean square errors multidisciplinary Noise reduction Receptive field Science Science (multidisciplinary) SPT-depth Stereoscopic display Training Transformer |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1La9wwEBYlUOil9F23aVGht1bEelo69hVCoT01kJuQpTHZQ73LrveQ_PqOJO822we9FF-MZZlhZuT5htF8IuR1MJZDHzXD6JIYRqiBWZUUM5GrpHkXTemQ-_LVnJ2rzxf64sZRX3lPWKUHroo76cD2JiEo7zqrBtPjp01UbYohGKdUbd3j-kYyVVm9hVPczV0yrbQnG4xUuZtMKKY711p2fRCJCmH_n1Dm75slf6mYlkB0eo_cnREkfVclv09uwfiA3K5nSl49JNNHTEyBonctyxZTmmA1XdJMplG7FCnCVJrpEWCZW1IWkdb-cpoDWqJ4s7pah--LRKcdqIU1DWOiZfMh26BVgQ5QGEHpsM1zH5Hz00_fPpyx-WgFFpUTEzMQnFHBcQWDCF0ITkZrANMVEzEjAuCyj62LBk08xDZKBXJwuWqKCGhwWj4mR-NyhKeEIqbplQvcWMBv64CAEy8Qg3BJt9I15M1OzX5VGTR8qXxL66tRPBrFF6P464a8z5bYv5nZr8sD9Ak_-4T_l0805HhnRz8vyY0XziI6RdlsQ17th3Ex5QpJGGG53fiSb2KSp3lDnlSz7yWRVgvbOpxtDxziQNTDkXFxWQi7OeaJ-CM3DXm7852fcv1dF8_-hy6ekzsiO30rmdDH5Ghab-EF4qipf1mWzA885BsO priority: 102 providerName: Directory of Open Access Journals – databaseName: AUTh Library subscriptions: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELagFRIXxJvQgozEDazGz9gnRKFVhUSFEJV6sxzboXtosmyyh_bXd-wkWy0v5RLFeTj-xp5vPJ4xQm-d0jTWXhLQLoGAhmqIFkEQ5akIklZe5Qi5r6fq5Ex8OZfn04RbPy2rnMfEPFCHzqc58gNmNJALsC70h-UvknaNSt7VaQuNu2iXUZHctLuHR6ffvm9mWZIfS1AzRcuUXB_0oLFSVBkTRFam1OR6SyPlxP1_Y5t_Lpr8zXOaFdLxQ_RgYpL44wj9I3Qnto_RvXFvyasnaPgMBmrEUO0uLzXFIS6HC5ySaozRihjoKk5pEmKXQlMWHo9x5jgptoDhZHm1cpeLgIeZ3MYVdm3AeREi6QHdiJuYM4PiZp2efYrOjo9-fDoh0xYLxAvDBqKiM0o4Q0VsmKucM9xrFcFsUR4soxgpr31pvAKoG196LiJvTPKeAhNqjOTP0E7btfEFwsBtamEcVTrCu6UD4glHZA0zQZbcFOjd3Mx2OWbSsNkDzrUdQbEAis2g2OsCHSYkNnemLNj5Qrf6aadOZauoaxXAYKsqLRpVg9gpL8rgnVNGCPjk_oyjnbpmb28FqUBvNsXQqZKnxLWxW_c2251g7ElaoOcj7JuacC2ZLg08rbcEYquq2yXt4iIn7qZgL8KArgr0fpad23r9uy1e_v839tB9lsS55ITJfbQzrNbxFTCloX49dYcbBpwUBA priority: 102 providerName: ProQuest – databaseName: Springer_OA刊 dbid: C6C link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELagCIkLKu_QgozEDSziZ-wjLFQVEpyo1Jvl2BN1D2RXu9lD--sZO9lFgXJAuURxnIz8jTPfZDxjQt4GYzm0UTO0LomhheqYVUkxE7lKmjfRlAy5b9_N-YX6eqkvpzI5ORdmFr-X9sMWDUxOAhOK6cbVlt3cJfc0N3XW4IVZHP6n5IiV4m7Ki7m968z2lBL9t_HKv5dH_hEjLabn7Jg8nDgj_TiC_Ijcgf4xuT_uInn9hAyf0RUFivq0KotKaYL1cEVz-YwxL5EiMaW5IAKschLKMtIxo5xmE5YonqyvN-HnMtFhT2NhQ0OfaFluyLaII9AOSg1Q2u1y36fk4uzLj8U5mzZTYFE5MTADwRkVHFfQidCE4GS0BtBBMRF9IAAu21i7aBDULtZRKpCdy3FS5Dyd0_IZOepXPbwgFFlMq1zgxgI-WwekmHiA6IRLupauIu_2w-zXY80MX2Ld0voRFI-g-AKKv6nIp4zE4c5c77pcQDXw0_TxDdjWJHTNmsaqzrSoYCaqOsUQjFMKX3m6x9FPk3DrhbPIR1E2W5E3h2acPjkmEnpY7ba-eJjo1mlekecj7AdJpNXC1g5725lCzESdt_TLq1Kim6NniJ9uU5H3e935Lde_x-Ll_91-Qh6IrN61ZEKfkqNhs4NXyJGG9nWZHL8Afq0MJQ priority: 102 providerName: Springer Nature |
Title | Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusion |
URI | https://link.springer.com/article/10.1038/s41598-024-57908-z https://www.ncbi.nlm.nih.gov/pubmed/38528098 https://www.proquest.com/docview/2985401918 https://search.proquest.com/docview/3014008651 https://pubmed.ncbi.nlm.nih.gov/PMC10963766 https://doaj.org/article/7e8b6d9667784f6b81e6c40dcaa69449 |
Volume | 14 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1La9wwEBZJSqGX0nfdpIsKvbXK2npZOjbbhFBICKWBvQlZkpuFrr3seg_Jr89IXm-7fVyKL7blx-BvhvnGmhkh9N5KVYTKCQLexRPwUDVR3HMiXcG9KEonU4XcxaU8v-ZfpmK6h-RQC5OS9l01O25-zI-b2U3KrVzM3XjIExtfXUwK4N1gGHK8j_ZLxn6J0fuO3lTzQm8qZHKmxivwUrGSjHIiSp0rcrfjhVKz_r8xzD8TJX-bLU1O6OwJerxhj_hTL-VTtBeaZ-hhv57k7XPUfYagNGDQrDall2IfFt0Njo00-gpFDBQVx9YIoY3lKDOH-9pyHJ2Zx7CzuF3a-czjbiC0YYlt43FKPCQrQDTgOqRuoLhex3tfoOuz02-Tc7JZVoE4rmlHZLBacqsLHmpqS2s1c0oGCFWkg2gohIJVLtdOAry1yx3jgdU6zpgC-6m1YC_RQdM24TXCwGcqrm0hVYBnCwtkE7ZAa6q9yJnO0IfhM5tF3z3DpFlvpkwPigFQTALF3GXoJCKxvTJ2vk4n2uV3s8HflEFV0kOQVpaK17ICVZOO595ZKzXn8MqjAUezMceVoVoBMwXZVIbebYfBkOLsiG1Cu16ZFGtCgCeKDL3qYd9KwpSgKtdwt9pRiB1Rd0dAd1Oz7kFXM_Rx0J2fcv37W7z5_zcdokc0qnrOCBVH6KBbrsNbYE5dNQJzmZYj9ODk9PLqKxxN5GSU_kKMkgndA-b1Hp4 |
link.rule.ids | 230,315,733,786,790,870,891,2115,12083,21416,27955,27956,31752,31753,33777,33778,41153,42222,43343,43838,51609,53825,53827,74100,74657 |
linkProvider | National Library of Medicine |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9QwELZgKwQXxLMEChiJG1hNYsexT4hCqwXaFUKt1Jvl2BO6B5Jlkz20v56xk91qeSmXKM7D8Tf2fOPxjAl5baXKoHIFQ-3iGWqominhBZMuE77ISidjhNzJTE7PxOfz4nyccOvGZZXrMTEO1L51YY58P9cKyQVaF-rd4icLu0YF7-q4hcZNshNSbqoJ2Tk4nH39tpllCX4skekxWiblar9DjRWiynLBilKnil1taaSYuP9vbPPPRZO_eU6jQjq6R-6OTJK-H6C_T25A84DcGvaWvHxI-o9ooALFardxqSn1sOgvaEiqMUQrUqSrNKRJgDaEpswdHeLMaVBsnuLJ4nJpf8w97dfkFpbUNp7GRYisQ3SB1hAzg9J6FZ59RM6ODk8_TNm4xQJzQuc9k2C1FFZnAurcltZq7pQENFukQ8sIIOOVS7WTCHXtUscF8FoH7ykyoVoX_DGZNG0DTwhFblMJbTOpAN9dWCSeeEBe59oXKdcJebNuZrMYMmmY6AHnygygGATFRFDMVUIOAhKbO0MW7HihXX43Y6cyJahKejTYylKJWlYodtKJ1DtrpRYCP7m3xtGMXbMz14KUkFebYuxUwVNiG2hXnYl2Jxp7RZaQ3QH2TU24KnKVanxabQnEVlW3S5r5RUzcnaG9iAO6TMjbtexc1-vfbfH0_7_xktyenp4cm-NPsy_PyJ08iHbKWV7skUm_XMFzZE199WLsGr8AO9UW-g |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Jb9QwFLagCMQFsZNSwEjcwJokXmKfEFBGZas4UGluluOlnQPJdJI5tL-eZyeZatiUSxRncfLe8_te3obQKyNk4WvLCWgXR0BDBSKZY0TYgjleVFakDLlvx-LohH1e8MUY_9SNYZXTmpgWatfa-I98VioJ4AKsCzkLY1jE98P529U5iR2koqd1bKdxHd0ALZnHNg7Votr-b4keLVaoMW8mp3LWge6K-WUlI7xSuSSXO7oplfD_G-78M3zyNx9qUk3zu-jOiCnxu4EJ7qFrvrmPbg5dJi8eoP4QTFWPYdJtCjrFzq_6MxzLawx5ixiAK44FE3wbk1SWFg8Z5ziqOIdhZ3WxNj-XDvcTzPVrbBqHUzgi6YDOHgefaoTisInXPkQn848_PhyRsdkCsUyVPRHeKMGMKpgPpamMUdRK4cGAERZsJO8LWttcWQFEDza3lHkaVPSjAiYKitNHaK9pG_8EYUA5NVOmENLDvbkBCAqbL0OpHM-pytDr6TPr1VBTQydfOJV6IIoGouhEFH2ZofeREtszYz3sdKBdn-pRvHTlZS0cmG5VJVkQNTCgsCx31hihGINHHkx01KOQdvqKpTL0cjsM4hV9Jqbx7abTyQIFs48XGXo8kH07Eyp5KXMFV8sdhtiZ6u5IszxLJbwLsBxhaRcZejPxztW8_v0t9v__Gi_QLZAJ_fXT8Zen6HYZOTunpOQHaK9fb_wzgE99_TzJxS9irRm3 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dense+monocular+depth+estimation+for+stereoscopic+vision+based+on+pyramid+transformer+and+multi-scale+feature+fusion&rft.jtitle=Scientific+reports&rft.au=Xia%2C+Zhongyi&rft.au=Wu%2C+Tianzhao&rft.au=Wang%2C+Zhuoyan&rft.au=Zhou%2C+Man&rft.date=2024-03-25&rft.pub=Nature+Publishing+Group+UK&rft.eissn=2045-2322&rft.volume=14&rft_id=info:doi/10.1038%2Fs41598-024-57908-z&rft.externalDBID=PMC10963766 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2045-2322&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2045-2322&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2045-2322&client=summon |