Are vision transformers replacing convolutional neural networks in scene interpretation?: A review

Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication...

Full description

Saved in:

Bibliographic Details
Published in	Discover applied sciences Vol. 7; no. 9; pp. 932 - 21
Main Authors	Rosy, N. Arockia, Balasubadra, K., Deepa, K.
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.09.2025 Springer Nature B.V Springer
Subjects	Applied and Technical Physics Architecture Artificial neural networks Attention Chemistry/Food Science Computer vision Convolutional neural networks Datasets Deep learning Earth Sciences Engineering Environment Linear embedding Machine learning Materials Science Multi head attention Multimedia Neural networks Performance measurement Recognition Review Scene interpretation Semantics Vision transformers Visual observation Linear embedding Multi head attention Scene interpretation Convolutional neural networks Vision transformers
Online Access	Get full text

Cover

Loading…

Abstract	Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication that allows individuals to understand visual scenes in the same way they do. Technical advancements in computer vision have been overwhelmingly successful, primarily driven by the harnessing of deep learning algorithms. Recently, Vision Transformers (ViTs) have emerged as a viable alternative to conventional neural networks. Powered by an attention mechanism, ViT-based approaches have demonstrated competitive or superior performance to CNNs in several benchmark scene interpretation tasks. This research carries out a detailed and inclusive exploration of the scene recognition approaches using Convolutional Neural Networks (CNN) and ViTs. This article aims to present a comprehensive study of the existing advanced research views for CNNs and ViTs in scene recognition. This review presents a comprehensive and methodical analysis of recent developments in CNN and ViT-based models for scene recognition. A total of 142 peer-reviewed studies published between 2017 and 2024 were reviewed based on defined inclusion criteria, focusing on works that evaluate these models on public datasets. The review begins with an overview of the architectural foundations and functional variations of CNNs used for scene interpretation. Next, it explores the structure of ViTs, including their multi-head self-attention mechanisms, and assesses state-of-the-art ViT variants with respect to design innovations, training strategies, and performance metrics. As a final point, we discuss some possible future research directions for designing ViT models. Hence, this study can be employed as a reference for scholars and experts to develop new ViT architectures in this domain.
AbstractList	Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication that allows individuals to understand visual scenes in the same way they do. Technical advancements in computer vision have been overwhelmingly successful, primarily driven by the harnessing of deep learning algorithms. Recently, Vision Transformers (ViTs) have emerged as a viable alternative to conventional neural networks. Powered by an attention mechanism, ViT-based approaches have demonstrated competitive or superior performance to CNNs in several benchmark scene interpretation tasks. This research carries out a detailed and inclusive exploration of the scene recognition approaches using Convolutional Neural Networks (CNN) and ViTs. This article aims to present a comprehensive study of the existing advanced research views for CNNs and ViTs in scene recognition. This review presents a comprehensive and methodical analysis of recent developments in CNN and ViT-based models for scene recognition. A total of 142 peer-reviewed studies published between 2017 and 2024 were reviewed based on defined inclusion criteria, focusing on works that evaluate these models on public datasets. The review begins with an overview of the architectural foundations and functional variations of CNNs used for scene interpretation. Next, it explores the structure of ViTs, including their multi-head self-attention mechanisms, and assesses state-of-the-art ViT variants with respect to design innovations, training strategies, and performance metrics. As a final point, we discuss some possible future research directions for designing ViT models. Hence, this study can be employed as a reference for scholars and experts to develop new ViT architectures in this domain. Abstract Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication that allows individuals to understand visual scenes in the same way they do. Technical advancements in computer vision have been overwhelmingly successful, primarily driven by the harnessing of deep learning algorithms. Recently, Vision Transformers (ViTs) have emerged as a viable alternative to conventional neural networks. Powered by an attention mechanism, ViT-based approaches have demonstrated competitive or superior performance to CNNs in several benchmark scene interpretation tasks. This research carries out a detailed and inclusive exploration of the scene recognition approaches using Convolutional Neural Networks (CNN) and ViTs. This article aims to present a comprehensive study of the existing advanced research views for CNNs and ViTs in scene recognition. This review presents a comprehensive and methodical analysis of recent developments in CNN and ViT-based models for scene recognition. A total of 142 peer-reviewed studies published between 2017 and 2024 were reviewed based on defined inclusion criteria, focusing on works that evaluate these models on public datasets. The review begins with an overview of the architectural foundations and functional variations of CNNs used for scene interpretation. Next, it explores the structure of ViTs, including their multi-head self-attention mechanisms, and assesses state-of-the-art ViT variants with respect to design innovations, training strategies, and performance metrics. As a final point, we discuss some possible future research directions for designing ViT models. Hence, this study can be employed as a reference for scholars and experts to develop new ViT architectures in this domain.
ArticleNumber	932
Author	Balasubadra, K. Rosy, N. Arockia Deepa, K.
Author_xml	– sequence: 1 givenname: N. Arockia surname: Rosy fullname: Rosy, N. Arockia organization: Department of Information Technology, R.M.D. Engineering College – sequence: 2 givenname: K. surname: Balasubadra fullname: Balasubadra, K. organization: Department of Information Technology, R.M.D. Engineering College – sequence: 3 givenname: K. surname: Deepa fullname: Deepa, K. email: kdeepa@kiu.ac.ug organization: Department of Civil Engineering, School of Applied Science, Kampala International University
BookMark	eNp9kU1r20AQhpfgQFLHfyAnQc9qZr-02l6KCU0aMPTSnpfVamTkKLvurOzQf1_ZKk1POc2wPO-zDO8HtogpImO3HD5xAHOXlVBalCB0CUYbVfILdi0BVGlFxRf_7VdslfMOAKQEY7S9Zs2asDj2uU-xGMnH3CV6QcoF4X7woY_bIqR4TMNhnBA_FBEPdB7ja6LnXPSxyAEjTsuItCcc_Yn88rlYT45jj6837LLzQ8bV37lkPx--_rj_Vm6-Pz7drzdlEFry0jQ6CIFBCxt0sFVbtY2orMG6ko23aFuwjWh5q6DRvqs0WOiEN9BJ1A2XcsmeZm-b_M7tqX_x9Nsl37vzQ6Kt8zT2YUBXa6988HWFrVHYBcubztdGSWVAqfbk-ji79pR-HTCPbpcONN2fnRTSWinqWkyUmKlAKWfC7t-vHNypGjdX46Zq3Lkax6eQnEN5guMW6U39TuoPotGT7Q
Cites_doi	10.3390/s23052385 10.1145/3546157.3546166 10.1109/JIOT.2022.3176126 10.3390/s23146422 10.3390/computers12080151 10.1145/3505244 10.1186/s40537-021-00444-8 10.1109/TIP.2022.3162964 10.1016/j.apacoust.2023.109411 10.3390/rs13112216 10.1109/TNNLS.2019.2920374 10.3390/s19184024 10.3390/rs14030592 10.3390/drones7050287 10.1109/TPAMI.2022.3152247 10.1109/ICCV48922.2021.00041 10.1109/TGRS.2019.2917161 10.1016/j.isprsjprs.2018.01.004 10.3390/app14052024 10.3390/app13095521 10.1016/j.neunet.2022.06.038 10.1109/TGRS.2019.2909695 10.1109/ICCCR54399.2022.9790134 10.3390/math11051127 10.3390/rs15194804 10.3390/s23167050 10.1016/j.autcon.2022.104316 10.1016/j.injury.2022.04.013 10.1145/3512732.3533582 10.1016/j.procs.2023.01.209 10.5244/C.35.68 10.1145/3530811 10.3390/rs16010174 10.1007/978-3-030-87237-3_5 10.1109/ICCVW54120.2021.00252 10.3390/s20071999 10.1007/s41095-021-0247-3 10.1186/s42492-023-00140-9 10.3390/agriculture13050936 10.1109/TGRS.2019.2931801 10.3390/electronics10040371 10.3390/ai3020016 10.1109/ICECA52323.2021.9676146
ContentType	Journal Article
Copyright	The Author(s) 2025 The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml	– notice: The Author(s) 2025 – notice: The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID	C6C AAYXX CITATION 3V. 7XB 88I 8FE 8FG 8FK ABJCF ABUWG AEUYN AFKRA ATCPS AZQEC BENPR BGLVJ BHPHI BKSAR CCPQU D1I DWQXO GNUQQ HCIFZ KB. L6V M2P M7S PATMY PCBAR PDBOC PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PTHSS PYCSY Q9U DOA
DOI	10.1007/s42452-025-07574-1
DatabaseName	Springer Nature OA Free Journals CrossRef ProQuest Central (Corporate) ProQuest Central (purchase pre-March 2016) Science Database (Alumni Edition) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest One Sustainability (subscription) ProQuest Central UK/Ireland Agricultural & Environmental Science Collection ProQuest Central Essentials ProQuest Central Technology Collection (via ProQuest SciTech Premium Collection) Natural Science Collection Earth, Atmospheric & Aquatic Science ProQuest One Community College ProQuest Materials Science Collection ProQuest Central Korea ProQuest Central Student SciTech Premium Collection Materials Science Database ProQuest Engineering Collection Science Database Engineering Database Environmental Science Database (subscripiton) Earth, Atmospheric & Aquatic Science Database Materials Science Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition Engineering Collection Environmental Science Collection ProQuest Central Basic DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef Publicly Available Content Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials Materials Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central Earth, Atmospheric & Aquatic Science Collection ProQuest One Applied & Life Sciences ProQuest One Sustainability ProQuest Engineering Collection Natural Science Collection ProQuest Central Korea Agricultural & Environmental Science Collection Materials Science Database ProQuest Central (New) Engineering Collection ProQuest Materials Science Collection Engineering Database ProQuest Science Journals (Alumni Edition) ProQuest Central Basic ProQuest Science Journals ProQuest One Academic Eastern Edition Earth, Atmospheric & Aquatic Science Database ProQuest Technology Collection ProQuest SciTech Collection Environmental Science Collection ProQuest One Academic UKI Edition Materials Science & Engineering Collection Environmental Science Database ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni)
DatabaseTitleList	Publicly Available Content Database
Database_xml	– sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Architecture
EISSN	3004-9261 2523-3971
EndPage	21
ExternalDocumentID	oai_doaj_org_article_85a4aca86ed74efc91bfa874347044d3 10_1007_s42452_025_07574_1
GroupedDBID	AAJSJ AASML ADMLS ALMA_UNASSIGNED_HOLDINGS C6C GROUPED_DOAJ M~E SOJ AAYXX CITATION 0R~ 3V. 7XB 88I 8FE 8FG 8FK AAHNG AAKKN ABDZT ABECU ABEEZ ABFTV ABHQN ABJCF ABKCH ABMQK ABTMW ABUWG ABXPI ACACY ACMLO ACOKC ACSTC ACULB ADKNI ADURQ ADYFF AEJRE AEUYN AFGXO AFKRA AFQWF AGDGC AGJBK AILAN AITGF AJZVZ AMKLP ATCPS AXYYD AZQEC BAPOH BENPR BGLVJ BHPHI BKSAR C24 CCPQU D1I DWQXO EBLON EBS FNLPD GNUQQ GNWQR HCIFZ J-C KB. KOV L6V M2P M7S NQJWS OK1 PATMY PCBAR PDBOC PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PTHSS PYCSY Q9U STPWE TSG UOJIU UTJUX VEKWB VFIZW ZMTXR BGNMA M4Y NU0
ID	FETCH-LOGICAL-c2531-7b5c22ec529c5c96d6db2697e863ba9e9d09b2d1d40b5af65090f2a70f3e5b133
IEDL.DBID	BENPR
ISSN	3004-9261 2523-3963
IngestDate	Wed Aug 27 01:08:23 EDT 2025 Fri Aug 22 05:10:51 EDT 2025 Thu Aug 21 00:39:08 EDT 2025 Sat Aug 16 01:10:37 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	9
Keywords	Linear embedding Multi head attention Scene interpretation Convolutional neural networks Vision transformers
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c2531-7b5c22ec529c5c96d6db2697e863ba9e9d09b2d1d40b5af65090f2a70f3e5b133
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
OpenAccessLink	https://www.proquest.com/docview/3239932882?pq-origsite=%requestingapplication%
PQID	3239932882
PQPubID	5758472
PageCount	21
ParticipantIDs	doaj_primary_oai_doaj_org_article_85a4aca86ed74efc91bfa874347044d3 proquest_journals_3239932882 crossref_primary_10_1007_s42452_025_07574_1 springer_journals_10_1007_s42452_025_07574_1
PublicationCentury	2000
PublicationDate	2025-09-01
PublicationDateYYYYMMDD	2025-09-01
PublicationDate_xml	– month: 09 year: 2025 text: 2025-09-01 day: 01
PublicationDecade	2020
PublicationPlace	Cham
PublicationPlace_xml	– name: Cham – name: London
PublicationTitle	Discover applied sciences
PublicationTitleAbbrev	Discov Appl Sci
PublicationYear	2025
Publisher	Springer International Publishing Springer Nature B.V Springer
Publisher_xml	– name: Springer International Publishing – name: Springer Nature B.V – name: Springer
References	D Santos (7574_CR2) 2019; 19 S-M Tseng (7574_CR3) 2023; 23 K Al-Hammuri (7574_CR25) 2023; 6 K Xu (7574_CR36) 2022; 60 L Alzubaidi (7574_CR7) 2021; 8 J Maurício (7574_CR46) 2023; 13 7574_CR16 M Aggarwal (7574_CR13) 2023; 13 7574_CR14 7574_CR55 7574_CR10 7574_CR54 7574_CR50 A Vaswani (7574_CR8) 2017; 30 M Krichen (7574_CR6) 2023; 12 D Yu (7574_CR17) 2020; 20 Y Xu (7574_CR27) 2022; 8 T Abualsaud (7574_CR12) 2023; 7 N Aryal (7574_CR15) 2023; 210 Z Xue (7574_CR31) 2022; 31 7574_CR29 S Khan (7574_CR9) 2022; 54 P Deng (7574_CR30) 2021; 19 L Tanzi (7574_CR33) 2022; 53 R Reedha (7574_CR43) 2022; 14 W Zhou (7574_CR5) 2018; 145 O Uparkar (7574_CR41) 2023; 218 Y Ran (7574_CR20) 2024; 16 AE Shamsabadi (7574_CR52) 2022; 140 N He (7574_CR21) 2019; 31 7574_CR34 M Filipiuk (7574_CR39) 2022; 3087 H Sun (7574_CR18) 2019; 58 C Feng (7574_CR1) 2024; 14 J Xie (7574_CR57) 2019; 57 Z Sha (7574_CR38) 2022; 19 C Xin (7574_CR53) 2022; 149 X Lu (7574_CR11) 2019; 57 D Say (7574_CR56) 2023; 23 Y Said (7574_CR26) 2023; 11 Y Wu (7574_CR44) 2021; 66 S Jamil (7574_CR32) 2022; 3 7574_CR49 7574_CR48 7574_CR47 Y Lee (7574_CR19) 2021; 10 Z Wang (7574_CR37) 2022; 9 7574_CR45 Y Qing (7574_CR28) 2021; 13 7574_CR42 A Thapa (7574_CR4) 2023; 15 7574_CR40 K Han (7574_CR23) 2022; 45 J Chen (7574_CR35) 2022; 45 AM Ali (7574_CR24) 2023; 23 Y Tay (7574_CR22) 2020; 55 A Bakhtiarnia (7574_CR51) 2022; 153
References_xml	– volume: 23 start-page: 2385 issue: 5 year: 2023 ident: 7574_CR24 publication-title: Sensors doi: 10.3390/s23052385 – ident: 7574_CR50 doi: 10.1145/3546157.3546166 – volume: 9 start-page: 20975 year: 2022 ident: 7574_CR37 publication-title: IEEE Internet Things J doi: 10.1109/JIOT.2022.3176126 – ident: 7574_CR40 – volume: 23 start-page: 6422 year: 2023 ident: 7574_CR56 publication-title: Sensors (Basel) doi: 10.3390/s23146422 – volume: 12 issue: 8 year: 2023 ident: 7574_CR6 publication-title: Computers doi: 10.3390/computers12080151 – volume: 54 start-page: 1 year: 2022 ident: 7574_CR9 publication-title: ACM Comput Surv (CSUR) doi: 10.1145/3505244 – volume: 8 start-page: 53 year: 2021 ident: 7574_CR7 publication-title: J Big Data doi: 10.1186/s40537-021-00444-8 – ident: 7574_CR47 – volume: 3087 start-page: 1 year: 2022 ident: 7574_CR39 publication-title: AAAI Workshop Artif Intell Saf – ident: 7574_CR54 – volume: 31 start-page: 3095 year: 2022 ident: 7574_CR31 publication-title: IEEE Trans Image Process doi: 10.1109/TIP.2022.3162964 – volume: 210 year: 2023 ident: 7574_CR15 publication-title: Appl Acoust doi: 10.1016/j.apacoust.2023.109411 – volume: 13 start-page: 2216 issue: 11 year: 2021 ident: 7574_CR28 publication-title: Remote Sens doi: 10.3390/rs13112216 – volume: 19 start-page: 1 year: 2021 ident: 7574_CR30 publication-title: IEEE Geosci Remote Sens Lett – volume: 31 start-page: 1461 year: 2019 ident: 7574_CR21 publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2019.2920374 – volume: 45 year: 2022 ident: 7574_CR35 publication-title: J Food Process Eng – volume: 19 start-page: 4024 issue: 18 year: 2019 ident: 7574_CR2 publication-title: Sensors doi: 10.3390/s19184024 – volume: 14 year: 2022 ident: 7574_CR43 publication-title: Remote Sens doi: 10.3390/rs14030592 – volume: 7 start-page: 287 issue: 5 year: 2023 ident: 7574_CR12 publication-title: Drones doi: 10.3390/drones7050287 – volume: 45 start-page: 87 year: 2022 ident: 7574_CR23 publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2022.3152247 – ident: 7574_CR29 doi: 10.1109/ICCV48922.2021.00041 – volume: 57 start-page: 7894 year: 2019 ident: 7574_CR11 publication-title: IEEE Trans Geosci Remote Sens doi: 10.1109/TGRS.2019.2917161 – volume: 145 start-page: 197 year: 2018 ident: 7574_CR5 publication-title: ISPRS J Photogramm Remote Sens doi: 10.1016/j.isprsjprs.2018.01.004 – volume: 60 start-page: 1 year: 2022 ident: 7574_CR36 publication-title: IEEE Trans Geosci Remote Sens – volume: 14 start-page: 2024 issue: 5 year: 2024 ident: 7574_CR1 publication-title: Appl Sci doi: 10.3390/app14052024 – volume: 13 start-page: 5521 issue: 9 year: 2023 ident: 7574_CR46 publication-title: Appl Sci doi: 10.3390/app13095521 – volume: 153 start-page: 461 year: 2022 ident: 7574_CR51 publication-title: Neural Netw doi: 10.1016/j.neunet.2022.06.038 – volume: 57 start-page: 6916 year: 2019 ident: 7574_CR57 publication-title: IEEE Trans Geosci Remote Sens doi: 10.1109/TGRS.2019.2909695 – volume: 149 year: 2022 ident: 7574_CR53 publication-title: Comput Biol Med – volume: 19 start-page: 1 year: 2022 ident: 7574_CR38 publication-title: IEEE Geosci Remote Sens Lett – ident: 7574_CR45 doi: 10.1109/ICCCR54399.2022.9790134 – volume: 11 start-page: 1127 issue: 5 year: 2023 ident: 7574_CR26 publication-title: Mathematics doi: 10.3390/math11051127 – volume: 15 start-page: 4804 issue: 19 year: 2023 ident: 7574_CR4 publication-title: Remote Sens doi: 10.3390/rs15194804 – volume: 23 start-page: 7050 issue: 16 year: 2023 ident: 7574_CR3 publication-title: Sensors doi: 10.3390/s23167050 – volume: 140 start-page: 104316 year: 2022 ident: 7574_CR52 publication-title: Autom. Constr. doi: 10.1016/j.autcon.2022.104316 – volume: 53 start-page: 2625 year: 2022 ident: 7574_CR33 publication-title: Injury doi: 10.1016/j.injury.2022.04.013 – ident: 7574_CR48 doi: 10.1145/3512732.3533582 – volume: 218 start-page: 2338 year: 2023 ident: 7574_CR41 publication-title: Procedia Comput Sci doi: 10.1016/j.procs.2023.01.209 – ident: 7574_CR49 doi: 10.5244/C.35.68 – volume: 55 start-page: 1 year: 2020 ident: 7574_CR22 publication-title: ACM Comput Surv (CSUR) doi: 10.1145/3530811 – volume: 16 start-page: 174 issue: 1 year: 2024 ident: 7574_CR20 publication-title: Remote Sens doi: 10.3390/rs16010174 – ident: 7574_CR34 doi: 10.1007/978-3-030-87237-3_5 – ident: 7574_CR16 – ident: 7574_CR10 – ident: 7574_CR55 doi: 10.1109/ICCVW54120.2021.00252 – ident: 7574_CR14 – volume: 20 start-page: 1999 issue: 7 year: 2020 ident: 7574_CR17 publication-title: Sensors doi: 10.3390/s20071999 – volume: 8 start-page: 33 year: 2022 ident: 7574_CR27 publication-title: Comput Vis Media doi: 10.1007/s41095-021-0247-3 – volume: 30 start-page: 5998 year: 2017 ident: 7574_CR8 publication-title: Adv Neural Inf Process Syst – volume: 66 year: 2021 ident: 7574_CR44 publication-title: Phys Med Biol – volume: 6 issue: 1 year: 2023 ident: 7574_CR25 publication-title: Vis Comput Ind Biomed Art doi: 10.1186/s42492-023-00140-9 – volume: 13 start-page: 936 year: 2023 ident: 7574_CR13 publication-title: Agriculture (Basel) doi: 10.3390/agriculture13050936 – volume: 58 start-page: 82 year: 2019 ident: 7574_CR18 publication-title: IEEE Trans Geosci Remote Sens doi: 10.1109/TGRS.2019.2931801 – volume: 10 start-page: 371 issue: 4 year: 2021 ident: 7574_CR19 publication-title: Electronics doi: 10.3390/electronics10040371 – volume: 3 start-page: 260 year: 2022 ident: 7574_CR32 publication-title: AI doi: 10.3390/ai3020016 – ident: 7574_CR42 doi: 10.1109/ICECA52323.2021.9676146
SSID	ssj0003307759 ssj0002793483 ssib051670015
Score	2.3023243
SecondaryResourceType	review_article
Snippet	Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe... Abstract Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and...
SourceID	doaj proquest crossref springer
SourceType	Open Website Aggregation Database Index Database Publisher
StartPage	932
SubjectTerms	Applied and Technical Physics Architecture Artificial neural networks Attention Chemistry/Food Science Computer vision Convolutional neural networks Datasets Deep learning Earth Sciences Engineering Environment Linear embedding Machine learning Materials Science Multi head attention Multimedia Neural networks Performance measurement Recognition Review Scene interpretation Semantics Vision transformers Visual observation
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7SkxdRVFytkoM3XdxN8_QiVSzFgycLvYVkdwK9rNLW_-8ku1tbQbx4WtgXYWYyD_LNN4Rcs1AyWYc4JlVCjl4y5EZ5yH01YroUXrsWbfEqpzP-MhfzrVFfERPW0gO3grvTwnFXOS2hVhxCZUofnMa4x1XBeZ14PjHmbRVT0Qdjla6UMF2XTOqVi0d8LI_TWzFKKp6XO5EoEfbvZJk_DkZTvJkckoMuUaTjdoFHZA-aY-LHS6BtOzhd9yknJnB0CRFchT-iEUbemRN-Hukq0yWBvVd00dBI3wR0sQM2fLinY9p2sZyQ2eT57Wmad1MS8orhBsqVFxVjUAlmKlEZmSZESaNAy5F3BkxdGM_qsuaFFy5ExrwiMFRBGIHwWKKekkHz3sAZoYDlgiyli3Ugr4FhdaNMqR1I9Gsu6Izc9BKzHy0Zht3QHif5WpSvTfK1ZUYeo1A3b0Yi63QD1Ws79dq_1JuRYa8S2-2ulR3Fhly0Jc0yctur6fvx70s6_48lXZB9lswoAs2GZLBefsIlZiZrf5WM8AuLGN8I priority: 102 providerName: Directory of Open Access Journals – databaseName: Springer Nature OA Free Journals dbid: C6C link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZQWWBAPEWhIA9sENE4frKgUlFVDExU6mbZyVnqUlBb_j9nJym0goEpUnKJonvYd7677wi5YSFnsgpxTKqEDFfJkBnlIfNlwXQuvHZ1tcWrHE_4y1RMG5ic2Auzlb-_X8bMHMvi0FXc3BTPMNLZFXmh4piGoRyuz1MwLldKmKYv5vdXN_aeBNG_4VdupULTDjM6JAeNa0gHtSyPyA7Mj8n-D8DAE-IHC6B1OzhdtS4nOnB0AbG4CmloLCNv1Ak_FuEq0yUVey_pbE4jfBPQ2Uax4eMDHdC6i-WUTEbPb8Nx1kxJyEqGBpQpL0rGoBTMlKI0Mk2IkkaBloV3BkzVN55VecX7XrgQEfP6gaEIQgHCY4h6Rjrz9zmcEwoYLshcuhgH8goYRjfK5NqBxHXNBd0lty3_7EcNhmHXsMeJ2xa5bRO3bd4lT5HFa8oIZJ1uoHxtYxdWC8dd6bSESnEIpcl9cBrdGq76nFdFl_RaAdnGupa2iA25qEuadcldK7Tvx3__0sX_yC_JHkvqE0vKeqSzWnzCFfogK3-dlO8LbDjTVg priority: 102 providerName: Springer Nature
Title	Are vision transformers replacing convolutional neural networks in scene interpretation?: A review
URI	https://link.springer.com/article/10.1007/s42452-025-07574-1 https://www.proquest.com/docview/3239932882 https://doaj.org/article/85a4aca86ed74efc91bfa874347044d3
Volume	7
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1RbxMxDLZY-wJICAaIjlHlgTeI6OWSXMLLdCsrE4IJAZP2FiW5BI2HdmvL_8fJ5bYVCZ5OyuWik-04dmx_BnjNYsVkF1ObVBkoaslIdeMCdb5mqhJO2T7b4kyenvNPF-KiXLhtSlrloBOzou5WPt2Rv6tTESZ-r9jR1TVNXaNSdLW00NiDMapgpUYwPj45-_ptkChRpSqUcuD9ymE2XfOMzcnQA6M1il-ppMn1dCkMyGjq8IonacNptXNaZVD_HUv0r-BpPpMWj-FRMSZJ23P_CdwLy3142N6JDezDgzuIg0_BtetA-npysh1sVrQAyTqk7CycQ1IeepFHXDvhXeZHzhbfkMslSfhPgVzuZCsevSct6ctgnsH54uTH_JSWNgvUM9yBtHHCMxa8YNoLr2VuMSV1E5SsndVBdzPtWFd1fOaEjQlybxYZ8jDWQTj0cZ_DaLlahhdAAvobspI2OZK8Cwzdo0ZXygaJitFGNYE3AznNVY-mYW5wkzPxDRLfZOKbagLHieI3MxMSdh5YrX-asrGMEpZbb5UMXcND9Lpy0Sq0i3gz47yrJ3A48MuU7bkxt8I0gbcDD29f__uXDv6_2ku4z7L0pBy0Qxht17_DKzRatm4Ke2rxcQrj9sOXz9-nRU5xdC7n03wR8Ac7sOq2
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB5V5cBDQlBAXWjBBziBxdqxHRupqpbHsqWlp1bqzdiJg8pht-wuQv1T_Y3MOEnbRYJbT5ESx7LGn-fheQG8lI2Qpm6oTapJHLlkw10ZE49VIa3Q0YY22uLQTI7VlxN9sgYXfS4MhVX2PDEz6npW0R3524KSMPF_K3fPfnLqGkXe1b6FRguL_XT-G022xc7eR9zfV1KOPx19mPCuqwCvJAKOl1FXUqZKS1fpypncUcm4MllTxOCSq4cuylrUahh1aKjC3LCRuOSmSDoKugBFln9LFSjJKTN9_LnHrxaU89KJ1x_ZqecKlSuBSrT3eIFg7_J2cvYeOR0lp36yKLdLxcWKbMwtBFb03r9ctVkCjh_A_U51ZaMWaw9hLU034N7omidiA-5eq2_4COJonlibvc6WvYaM-iabJ4oFwzGMot479OPcVF0zP3Js-oKdThlVm0rsdCU2cvcdG7E26eYxHN8I-Z_A-nQ2TZvAElo3RphAZquqk0RjrHTChmSQDYfGDuB1T05_1tbu8JdVmjPxPRLfZ-J7MYD3RPHLkVR3O7-Yzb_77hh7q4MKVbAm1aVKTeVEbIJFLUyVQ6XqYgBb_X75jhks_BV0B_Cm38Orz_9e0tP_z_YCbk-Ovh74g73D_WdwR2YkUfTbFqwv57_SNqpLy_g8Y5TBt5s-FH8AS-Qhmw
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR1NaxQx9FG2ICoUrUq3rTUHPWnoJpPJTAQp24-ltbIUsdBbTGYSqYfdurtF_Gv-Ot_LzLRdQW89DcxkQnjfL-8L4LWMQuo60phUHThKychN4QP3VSZLkfvSNdkWY318rj5e5Bcr8LurhaG0yk4mJkFdTyu6I9_NqAgT_y_lbmzTIs4OR3tXPzhNkKJIazdOoyGR0_DrJ7pv8w8nh4jrN1KOjr4cHPN2wgCvJBIfL3xeSRmqXJoqr4xO05W0KUKpM-9MMPXAeFmLWg187iJ1mxtEicePWci9oMtQFP-rBXlFPVjdPxqffe6oORdUAdMq2-8pxGcylfqCSvT-eIak31bxpFo-CkFKTtNlUYsXioslTZkGCixZwX8FbpM-HD2BtdaQZcOG8p7CSpisw-PhnbjEOjy60-3wGfjhLLCmlp0tOnsZrU82C5QZhmsY5cC3vIB7U6_N9EiZ6nN2OWHUeyqwy6VMyb33bMiaEpzncH4vCHgBvcl0EjaABfR1tNCOnFhVB4muWWFE6YJGoexi2Ye3HTjtVdPJw970bE7Atwh8m4BvRR_2CeI3K6kLd3oxnX2zLVPbMnfKVa7UoS5UiJURProSbTJVDJSqsz5sd_iyrWiY21tC7sO7Doe3n_99pM3_7_YKHiBD2E8n49MteCgTIVEq3Db0FrPr8BJtp4XfaYmUwdf75os_3O4nLQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Are+vision+transformers+replacing+convolutional+neural+networks+in+scene+interpretation%3F%3A+A+review&rft.jtitle=SN+applied+sciences&rft.au=Rosy%2C+N.+Arockia&rft.au=Balasubadra%2C+K&rft.au=Deepa%2C+K&rft.date=2025-09-01&rft.pub=Springer+Nature+B.V&rft.issn=2523-3963&rft.eissn=2523-3971&rft.volume=7&rft.issue=9&rft.spage=932&rft_id=info:doi/10.1007%2Fs42452-025-07574-1&rft.externalDBID=HAS_PDF_LINK
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=3004-9261&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=3004-9261&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=3004-9261&client=summon