Are vision transformers replacing convolutional neural networks in scene interpretation?: A review

Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication...

Full description

Saved in:
Bibliographic Details
Published inDiscover applied sciences Vol. 7; no. 9; pp. 932 - 21
Main Authors Rosy, N. Arockia, Balasubadra, K., Deepa, K.
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.09.2025
Springer Nature B.V
Springer
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication that allows individuals to understand visual scenes in the same way they do. Technical advancements in computer vision have been overwhelmingly successful, primarily driven by the harnessing of deep learning algorithms. Recently, Vision Transformers (ViTs) have emerged as a viable alternative to conventional neural networks. Powered by an attention mechanism, ViT-based approaches have demonstrated competitive or superior performance to CNNs in several benchmark scene interpretation tasks. This research carries out a detailed and inclusive exploration of the scene recognition approaches using Convolutional Neural Networks (CNN) and ViTs. This article aims to present a comprehensive study of the existing advanced research views for CNNs and ViTs in scene recognition. This review presents a comprehensive and methodical analysis of recent developments in CNN and ViT-based models for scene recognition. A total of 142 peer-reviewed studies published between 2017 and 2024 were reviewed based on defined inclusion criteria, focusing on works that evaluate these models on public datasets. The review begins with an overview of the architectural foundations and functional variations of CNNs used for scene interpretation. Next, it explores the structure of ViTs, including their multi-head self-attention mechanisms, and assesses state-of-the-art ViT variants with respect to design innovations, training strategies, and performance metrics. As a final point, we discuss some possible future research directions for designing ViT models. Hence, this study can be employed as a reference for scholars and experts to develop new ViT architectures in this domain.
AbstractList Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication that allows individuals to understand visual scenes in the same way they do. Technical advancements in computer vision have been overwhelmingly successful, primarily driven by the harnessing of deep learning algorithms. Recently, Vision Transformers (ViTs) have emerged as a viable alternative to conventional neural networks. Powered by an attention mechanism, ViT-based approaches have demonstrated competitive or superior performance to CNNs in several benchmark scene interpretation tasks. This research carries out a detailed and inclusive exploration of the scene recognition approaches using Convolutional Neural Networks (CNN) and ViTs. This article aims to present a comprehensive study of the existing advanced research views for CNNs and ViTs in scene recognition. This review presents a comprehensive and methodical analysis of recent developments in CNN and ViT-based models for scene recognition. A total of 142 peer-reviewed studies published between 2017 and 2024 were reviewed based on defined inclusion criteria, focusing on works that evaluate these models on public datasets. The review begins with an overview of the architectural foundations and functional variations of CNNs used for scene interpretation. Next, it explores the structure of ViTs, including their multi-head self-attention mechanisms, and assesses state-of-the-art ViT variants with respect to design innovations, training strategies, and performance metrics. As a final point, we discuss some possible future research directions for designing ViT models. Hence, this study can be employed as a reference for scholars and experts to develop new ViT architectures in this domain.
Abstract Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe communication with the natural world and environmental affairs. Cutting-edge computer vision technology plays a key role in enabling communication that allows individuals to understand visual scenes in the same way they do. Technical advancements in computer vision have been overwhelmingly successful, primarily driven by the harnessing of deep learning algorithms. Recently, Vision Transformers (ViTs) have emerged as a viable alternative to conventional neural networks. Powered by an attention mechanism, ViT-based approaches have demonstrated competitive or superior performance to CNNs in several benchmark scene interpretation tasks. This research carries out a detailed and inclusive exploration of the scene recognition approaches using Convolutional Neural Networks (CNN) and ViTs. This article aims to present a comprehensive study of the existing advanced research views for CNNs and ViTs in scene recognition. This review presents a comprehensive and methodical analysis of recent developments in CNN and ViT-based models for scene recognition. A total of 142 peer-reviewed studies published between 2017 and 2024 were reviewed based on defined inclusion criteria, focusing on works that evaluate these models on public datasets. The review begins with an overview of the architectural foundations and functional variations of CNNs used for scene interpretation. Next, it explores the structure of ViTs, including their multi-head self-attention mechanisms, and assesses state-of-the-art ViT variants with respect to design innovations, training strategies, and performance metrics. As a final point, we discuss some possible future research directions for designing ViT models. Hence, this study can be employed as a reference for scholars and experts to develop new ViT architectures in this domain.
ArticleNumber 932
Author Balasubadra, K.
Rosy, N. Arockia
Deepa, K.
Author_xml – sequence: 1
  givenname: N. Arockia
  surname: Rosy
  fullname: Rosy, N. Arockia
  organization: Department of Information Technology, R.M.D. Engineering College
– sequence: 2
  givenname: K.
  surname: Balasubadra
  fullname: Balasubadra, K.
  organization: Department of Information Technology, R.M.D. Engineering College
– sequence: 3
  givenname: K.
  surname: Deepa
  fullname: Deepa, K.
  email: kdeepa@kiu.ac.ug
  organization: Department of Civil Engineering, School of Applied Science, Kampala International University
BookMark eNp9kU1r20AQhpfgQFLHfyAnQc9qZr-02l6KCU0aMPTSnpfVamTkKLvurOzQf1_ZKk1POc2wPO-zDO8HtogpImO3HD5xAHOXlVBalCB0CUYbVfILdi0BVGlFxRf_7VdslfMOAKQEY7S9Zs2asDj2uU-xGMnH3CV6QcoF4X7woY_bIqR4TMNhnBA_FBEPdB7ja6LnXPSxyAEjTsuItCcc_Yn88rlYT45jj6837LLzQ8bV37lkPx--_rj_Vm6-Pz7drzdlEFry0jQ6CIFBCxt0sFVbtY2orMG6ko23aFuwjWh5q6DRvqs0WOiEN9BJ1A2XcsmeZm-b_M7tqX_x9Nsl37vzQ6Kt8zT2YUBXa6988HWFrVHYBcubztdGSWVAqfbk-ji79pR-HTCPbpcONN2fnRTSWinqWkyUmKlAKWfC7t-vHNypGjdX46Zq3Lkax6eQnEN5guMW6U39TuoPotGT7Q
Cites_doi 10.3390/s23052385
10.1145/3546157.3546166
10.1109/JIOT.2022.3176126
10.3390/s23146422
10.3390/computers12080151
10.1145/3505244
10.1186/s40537-021-00444-8
10.1109/TIP.2022.3162964
10.1016/j.apacoust.2023.109411
10.3390/rs13112216
10.1109/TNNLS.2019.2920374
10.3390/s19184024
10.3390/rs14030592
10.3390/drones7050287
10.1109/TPAMI.2022.3152247
10.1109/ICCV48922.2021.00041
10.1109/TGRS.2019.2917161
10.1016/j.isprsjprs.2018.01.004
10.3390/app14052024
10.3390/app13095521
10.1016/j.neunet.2022.06.038
10.1109/TGRS.2019.2909695
10.1109/ICCCR54399.2022.9790134
10.3390/math11051127
10.3390/rs15194804
10.3390/s23167050
10.1016/j.autcon.2022.104316
10.1016/j.injury.2022.04.013
10.1145/3512732.3533582
10.1016/j.procs.2023.01.209
10.5244/C.35.68
10.1145/3530811
10.3390/rs16010174
10.1007/978-3-030-87237-3_5
10.1109/ICCVW54120.2021.00252
10.3390/s20071999
10.1007/s41095-021-0247-3
10.1186/s42492-023-00140-9
10.3390/agriculture13050936
10.1109/TGRS.2019.2931801
10.3390/electronics10040371
10.3390/ai3020016
10.1109/ICECA52323.2021.9676146
ContentType Journal Article
Copyright The Author(s) 2025
The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2025
– notice: The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
3V.
7XB
88I
8FE
8FG
8FK
ABJCF
ABUWG
AEUYN
AFKRA
ATCPS
AZQEC
BENPR
BGLVJ
BHPHI
BKSAR
CCPQU
D1I
DWQXO
GNUQQ
HCIFZ
KB.
L6V
M2P
M7S
PATMY
PCBAR
PDBOC
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
PYCSY
Q9U
DOA
DOI 10.1007/s42452-025-07574-1
DatabaseName Springer Nature OA Free Journals
CrossRef
ProQuest Central (Corporate)
ProQuest Central (purchase pre-March 2016)
Science Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest One Sustainability (subscription)
ProQuest Central UK/Ireland
Agricultural & Environmental Science Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection (via ProQuest SciTech Premium Collection)
Natural Science Collection
Earth, Atmospheric & Aquatic Science
ProQuest One Community College
ProQuest Materials Science Collection
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
Materials Science Database
ProQuest Engineering Collection
Science Database
Engineering Database
Environmental Science Database (subscripiton)
Earth, Atmospheric & Aquatic Science Database
Materials Science Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Engineering Collection
Environmental Science Collection
ProQuest Central Basic
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
Materials Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central
Earth, Atmospheric & Aquatic Science Collection
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
ProQuest Engineering Collection
Natural Science Collection
ProQuest Central Korea
Agricultural & Environmental Science Collection
Materials Science Database
ProQuest Central (New)
Engineering Collection
ProQuest Materials Science Collection
Engineering Database
ProQuest Science Journals (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest One Academic Eastern Edition
Earth, Atmospheric & Aquatic Science Database
ProQuest Technology Collection
ProQuest SciTech Collection
Environmental Science Collection
ProQuest One Academic UKI Edition
Materials Science & Engineering Collection
Environmental Science Database
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList
Publicly Available Content Database

Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Architecture
EISSN 3004-9261
2523-3971
EndPage 21
ExternalDocumentID oai_doaj_org_article_85a4aca86ed74efc91bfa874347044d3
10_1007_s42452_025_07574_1
GroupedDBID AAJSJ
AASML
ADMLS
ALMA_UNASSIGNED_HOLDINGS
C6C
GROUPED_DOAJ
M~E
SOJ
AAYXX
CITATION
0R~
3V.
7XB
88I
8FE
8FG
8FK
AAHNG
AAKKN
ABDZT
ABECU
ABEEZ
ABFTV
ABHQN
ABJCF
ABKCH
ABMQK
ABTMW
ABUWG
ABXPI
ACACY
ACMLO
ACOKC
ACSTC
ACULB
ADKNI
ADURQ
ADYFF
AEJRE
AEUYN
AFGXO
AFKRA
AFQWF
AGDGC
AGJBK
AILAN
AITGF
AJZVZ
AMKLP
ATCPS
AXYYD
AZQEC
BAPOH
BENPR
BGLVJ
BHPHI
BKSAR
C24
CCPQU
D1I
DWQXO
EBLON
EBS
FNLPD
GNUQQ
GNWQR
HCIFZ
J-C
KB.
KOV
L6V
M2P
M7S
NQJWS
OK1
PATMY
PCBAR
PDBOC
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
PYCSY
Q9U
STPWE
TSG
UOJIU
UTJUX
VEKWB
VFIZW
ZMTXR
BGNMA
M4Y
NU0
ID FETCH-LOGICAL-c2531-7b5c22ec529c5c96d6db2697e863ba9e9d09b2d1d40b5af65090f2a70f3e5b133
IEDL.DBID BENPR
ISSN 3004-9261
2523-3963
IngestDate Wed Aug 27 01:08:23 EDT 2025
Fri Aug 22 05:10:51 EDT 2025
Thu Aug 21 00:39:08 EDT 2025
Sat Aug 16 01:10:37 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 9
Keywords Linear embedding
Multi head attention
Scene interpretation
Convolutional neural networks
Vision transformers
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2531-7b5c22ec529c5c96d6db2697e863ba9e9d09b2d1d40b5af65090f2a70f3e5b133
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/3239932882?pq-origsite=%requestingapplication%
PQID 3239932882
PQPubID 5758472
PageCount 21
ParticipantIDs doaj_primary_oai_doaj_org_article_85a4aca86ed74efc91bfa874347044d3
proquest_journals_3239932882
crossref_primary_10_1007_s42452_025_07574_1
springer_journals_10_1007_s42452_025_07574_1
PublicationCentury 2000
PublicationDate 2025-09-01
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-01
  day: 01
PublicationDecade 2020
PublicationPlace Cham
PublicationPlace_xml – name: Cham
– name: London
PublicationTitle Discover applied sciences
PublicationTitleAbbrev Discov Appl Sci
PublicationYear 2025
Publisher Springer International Publishing
Springer Nature B.V
Springer
Publisher_xml – name: Springer International Publishing
– name: Springer Nature B.V
– name: Springer
References D Santos (7574_CR2) 2019; 19
S-M Tseng (7574_CR3) 2023; 23
K Al-Hammuri (7574_CR25) 2023; 6
K Xu (7574_CR36) 2022; 60
L Alzubaidi (7574_CR7) 2021; 8
J Maurício (7574_CR46) 2023; 13
7574_CR16
M Aggarwal (7574_CR13) 2023; 13
7574_CR14
7574_CR55
7574_CR10
7574_CR54
7574_CR50
A Vaswani (7574_CR8) 2017; 30
M Krichen (7574_CR6) 2023; 12
D Yu (7574_CR17) 2020; 20
Y Xu (7574_CR27) 2022; 8
T Abualsaud (7574_CR12) 2023; 7
N Aryal (7574_CR15) 2023; 210
Z Xue (7574_CR31) 2022; 31
7574_CR29
S Khan (7574_CR9) 2022; 54
P Deng (7574_CR30) 2021; 19
L Tanzi (7574_CR33) 2022; 53
R Reedha (7574_CR43) 2022; 14
W Zhou (7574_CR5) 2018; 145
O Uparkar (7574_CR41) 2023; 218
Y Ran (7574_CR20) 2024; 16
AE Shamsabadi (7574_CR52) 2022; 140
N He (7574_CR21) 2019; 31
7574_CR34
M Filipiuk (7574_CR39) 2022; 3087
H Sun (7574_CR18) 2019; 58
C Feng (7574_CR1) 2024; 14
J Xie (7574_CR57) 2019; 57
Z Sha (7574_CR38) 2022; 19
C Xin (7574_CR53) 2022; 149
X Lu (7574_CR11) 2019; 57
D Say (7574_CR56) 2023; 23
Y Said (7574_CR26) 2023; 11
Y Wu (7574_CR44) 2021; 66
S Jamil (7574_CR32) 2022; 3
7574_CR49
7574_CR48
7574_CR47
Y Lee (7574_CR19) 2021; 10
Z Wang (7574_CR37) 2022; 9
7574_CR45
Y Qing (7574_CR28) 2021; 13
7574_CR42
A Thapa (7574_CR4) 2023; 15
7574_CR40
K Han (7574_CR23) 2022; 45
J Chen (7574_CR35) 2022; 45
AM Ali (7574_CR24) 2023; 23
Y Tay (7574_CR22) 2020; 55
A Bakhtiarnia (7574_CR51) 2022; 153
References_xml – volume: 23
  start-page: 2385
  issue: 5
  year: 2023
  ident: 7574_CR24
  publication-title: Sensors
  doi: 10.3390/s23052385
– ident: 7574_CR50
  doi: 10.1145/3546157.3546166
– volume: 9
  start-page: 20975
  year: 2022
  ident: 7574_CR37
  publication-title: IEEE Internet Things J
  doi: 10.1109/JIOT.2022.3176126
– ident: 7574_CR40
– volume: 23
  start-page: 6422
  year: 2023
  ident: 7574_CR56
  publication-title: Sensors (Basel)
  doi: 10.3390/s23146422
– volume: 12
  issue: 8
  year: 2023
  ident: 7574_CR6
  publication-title: Computers
  doi: 10.3390/computers12080151
– volume: 54
  start-page: 1
  year: 2022
  ident: 7574_CR9
  publication-title: ACM Comput Surv (CSUR)
  doi: 10.1145/3505244
– volume: 8
  start-page: 53
  year: 2021
  ident: 7574_CR7
  publication-title: J Big Data
  doi: 10.1186/s40537-021-00444-8
– ident: 7574_CR47
– volume: 3087
  start-page: 1
  year: 2022
  ident: 7574_CR39
  publication-title: AAAI Workshop Artif Intell Saf
– ident: 7574_CR54
– volume: 31
  start-page: 3095
  year: 2022
  ident: 7574_CR31
  publication-title: IEEE Trans Image Process
  doi: 10.1109/TIP.2022.3162964
– volume: 210
  year: 2023
  ident: 7574_CR15
  publication-title: Appl Acoust
  doi: 10.1016/j.apacoust.2023.109411
– volume: 13
  start-page: 2216
  issue: 11
  year: 2021
  ident: 7574_CR28
  publication-title: Remote Sens
  doi: 10.3390/rs13112216
– volume: 19
  start-page: 1
  year: 2021
  ident: 7574_CR30
  publication-title: IEEE Geosci Remote Sens Lett
– volume: 31
  start-page: 1461
  year: 2019
  ident: 7574_CR21
  publication-title: IEEE Trans Neural Netw Learn Syst
  doi: 10.1109/TNNLS.2019.2920374
– volume: 45
  year: 2022
  ident: 7574_CR35
  publication-title: J Food Process Eng
– volume: 19
  start-page: 4024
  issue: 18
  year: 2019
  ident: 7574_CR2
  publication-title: Sensors
  doi: 10.3390/s19184024
– volume: 14
  year: 2022
  ident: 7574_CR43
  publication-title: Remote Sens
  doi: 10.3390/rs14030592
– volume: 7
  start-page: 287
  issue: 5
  year: 2023
  ident: 7574_CR12
  publication-title: Drones
  doi: 10.3390/drones7050287
– volume: 45
  start-page: 87
  year: 2022
  ident: 7574_CR23
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2022.3152247
– ident: 7574_CR29
  doi: 10.1109/ICCV48922.2021.00041
– volume: 57
  start-page: 7894
  year: 2019
  ident: 7574_CR11
  publication-title: IEEE Trans Geosci Remote Sens
  doi: 10.1109/TGRS.2019.2917161
– volume: 145
  start-page: 197
  year: 2018
  ident: 7574_CR5
  publication-title: ISPRS J Photogramm Remote Sens
  doi: 10.1016/j.isprsjprs.2018.01.004
– volume: 60
  start-page: 1
  year: 2022
  ident: 7574_CR36
  publication-title: IEEE Trans Geosci Remote Sens
– volume: 14
  start-page: 2024
  issue: 5
  year: 2024
  ident: 7574_CR1
  publication-title: Appl Sci
  doi: 10.3390/app14052024
– volume: 13
  start-page: 5521
  issue: 9
  year: 2023
  ident: 7574_CR46
  publication-title: Appl Sci
  doi: 10.3390/app13095521
– volume: 153
  start-page: 461
  year: 2022
  ident: 7574_CR51
  publication-title: Neural Netw
  doi: 10.1016/j.neunet.2022.06.038
– volume: 57
  start-page: 6916
  year: 2019
  ident: 7574_CR57
  publication-title: IEEE Trans Geosci Remote Sens
  doi: 10.1109/TGRS.2019.2909695
– volume: 149
  year: 2022
  ident: 7574_CR53
  publication-title: Comput Biol Med
– volume: 19
  start-page: 1
  year: 2022
  ident: 7574_CR38
  publication-title: IEEE Geosci Remote Sens Lett
– ident: 7574_CR45
  doi: 10.1109/ICCCR54399.2022.9790134
– volume: 11
  start-page: 1127
  issue: 5
  year: 2023
  ident: 7574_CR26
  publication-title: Mathematics
  doi: 10.3390/math11051127
– volume: 15
  start-page: 4804
  issue: 19
  year: 2023
  ident: 7574_CR4
  publication-title: Remote Sens
  doi: 10.3390/rs15194804
– volume: 23
  start-page: 7050
  issue: 16
  year: 2023
  ident: 7574_CR3
  publication-title: Sensors
  doi: 10.3390/s23167050
– volume: 140
  start-page: 104316
  year: 2022
  ident: 7574_CR52
  publication-title: Autom. Constr.
  doi: 10.1016/j.autcon.2022.104316
– volume: 53
  start-page: 2625
  year: 2022
  ident: 7574_CR33
  publication-title: Injury
  doi: 10.1016/j.injury.2022.04.013
– ident: 7574_CR48
  doi: 10.1145/3512732.3533582
– volume: 218
  start-page: 2338
  year: 2023
  ident: 7574_CR41
  publication-title: Procedia Comput Sci
  doi: 10.1016/j.procs.2023.01.209
– ident: 7574_CR49
  doi: 10.5244/C.35.68
– volume: 55
  start-page: 1
  year: 2020
  ident: 7574_CR22
  publication-title: ACM Comput Surv (CSUR)
  doi: 10.1145/3530811
– volume: 16
  start-page: 174
  issue: 1
  year: 2024
  ident: 7574_CR20
  publication-title: Remote Sens
  doi: 10.3390/rs16010174
– ident: 7574_CR34
  doi: 10.1007/978-3-030-87237-3_5
– ident: 7574_CR16
– ident: 7574_CR10
– ident: 7574_CR55
  doi: 10.1109/ICCVW54120.2021.00252
– ident: 7574_CR14
– volume: 20
  start-page: 1999
  issue: 7
  year: 2020
  ident: 7574_CR17
  publication-title: Sensors
  doi: 10.3390/s20071999
– volume: 8
  start-page: 33
  year: 2022
  ident: 7574_CR27
  publication-title: Comput Vis Media
  doi: 10.1007/s41095-021-0247-3
– volume: 30
  start-page: 5998
  year: 2017
  ident: 7574_CR8
  publication-title: Adv Neural Inf Process Syst
– volume: 66
  year: 2021
  ident: 7574_CR44
  publication-title: Phys Med Biol
– volume: 6
  issue: 1
  year: 2023
  ident: 7574_CR25
  publication-title: Vis Comput Ind Biomed Art
  doi: 10.1186/s42492-023-00140-9
– volume: 13
  start-page: 936
  year: 2023
  ident: 7574_CR13
  publication-title: Agriculture (Basel)
  doi: 10.3390/agriculture13050936
– volume: 58
  start-page: 82
  year: 2019
  ident: 7574_CR18
  publication-title: IEEE Trans Geosci Remote Sens
  doi: 10.1109/TGRS.2019.2931801
– volume: 10
  start-page: 371
  issue: 4
  year: 2021
  ident: 7574_CR19
  publication-title: Electronics
  doi: 10.3390/electronics10040371
– volume: 3
  start-page: 260
  year: 2022
  ident: 7574_CR32
  publication-title: AI
  doi: 10.3390/ai3020016
– ident: 7574_CR42
  doi: 10.1109/ICECA52323.2021.9676146
SSID ssj0003307759
ssj0002793483
ssib051670015
Score 2.3023243
SecondaryResourceType review_article
Snippet Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and safe...
Abstract Visual scene interpretation is a significant and daunting process of observing, exploring, and elaborating dynamic scenes. It provides reliable and...
SourceID doaj
proquest
crossref
springer
SourceType Open Website
Aggregation Database
Index Database
Publisher
StartPage 932
SubjectTerms Applied and Technical Physics
Architecture
Artificial neural networks
Attention
Chemistry/Food Science
Computer vision
Convolutional neural networks
Datasets
Deep learning
Earth Sciences
Engineering
Environment
Linear embedding
Machine learning
Materials Science
Multi head attention
Multimedia
Neural networks
Performance measurement
Recognition
Review
Scene interpretation
Semantics
Vision transformers
Visual observation
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7SkxdRVFytkoM3XdxN8_QiVSzFgycLvYVkdwK9rNLW_-8ku1tbQbx4WtgXYWYyD_LNN4Rcs1AyWYc4JlVCjl4y5EZ5yH01YroUXrsWbfEqpzP-MhfzrVFfERPW0gO3grvTwnFXOS2hVhxCZUofnMa4x1XBeZ14PjHmbRVT0Qdjla6UMF2XTOqVi0d8LI_TWzFKKp6XO5EoEfbvZJk_DkZTvJkckoMuUaTjdoFHZA-aY-LHS6BtOzhd9yknJnB0CRFchT-iEUbemRN-Hukq0yWBvVd00dBI3wR0sQM2fLinY9p2sZyQ2eT57Wmad1MS8orhBsqVFxVjUAlmKlEZmSZESaNAy5F3BkxdGM_qsuaFFy5ExrwiMFRBGIHwWKKekkHz3sAZoYDlgiyli3Ugr4FhdaNMqR1I9Gsu6Izc9BKzHy0Zht3QHif5WpSvTfK1ZUYeo1A3b0Yi63QD1Ws79dq_1JuRYa8S2-2ulR3Fhly0Jc0yctur6fvx70s6_48lXZB9lswoAs2GZLBefsIlZiZrf5WM8AuLGN8I
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Springer Nature OA Free Journals
  dbid: C6C
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZQWWBAPEWhIA9sENE4frKgUlFVDExU6mbZyVnqUlBb_j9nJym0goEpUnKJonvYd7677wi5YSFnsgpxTKqEDFfJkBnlIfNlwXQuvHZ1tcWrHE_4y1RMG5ic2Auzlb-_X8bMHMvi0FXc3BTPMNLZFXmh4piGoRyuz1MwLldKmKYv5vdXN_aeBNG_4VdupULTDjM6JAeNa0gHtSyPyA7Mj8n-D8DAE-IHC6B1OzhdtS4nOnB0AbG4CmloLCNv1Ak_FuEq0yUVey_pbE4jfBPQ2Uax4eMDHdC6i-WUTEbPb8Nx1kxJyEqGBpQpL0rGoBTMlKI0Mk2IkkaBloV3BkzVN55VecX7XrgQEfP6gaEIQgHCY4h6Rjrz9zmcEwoYLshcuhgH8goYRjfK5NqBxHXNBd0lty3_7EcNhmHXsMeJ2xa5bRO3bd4lT5HFa8oIZJ1uoHxtYxdWC8dd6bSESnEIpcl9cBrdGq76nFdFl_RaAdnGupa2iA25qEuadcldK7Tvx3__0sX_yC_JHkvqE0vKeqSzWnzCFfogK3-dlO8LbDjTVg
  priority: 102
  providerName: Springer Nature
Title Are vision transformers replacing convolutional neural networks in scene interpretation?: A review
URI https://link.springer.com/article/10.1007/s42452-025-07574-1
https://www.proquest.com/docview/3239932882
https://doaj.org/article/85a4aca86ed74efc91bfa874347044d3
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1RbxMxDLZY-wJICAaIjlHlgTeI6OWSXMLLdCsrE4IJAZP2FiW5BI2HdmvL_8fJ5bYVCZ5OyuWik-04dmx_BnjNYsVkF1ObVBkoaslIdeMCdb5mqhJO2T7b4kyenvNPF-KiXLhtSlrloBOzou5WPt2Rv6tTESZ-r9jR1TVNXaNSdLW00NiDMapgpUYwPj45-_ptkChRpSqUcuD9ymE2XfOMzcnQA6M1il-ppMn1dCkMyGjq8IonacNptXNaZVD_HUv0r-BpPpMWj-FRMSZJ23P_CdwLy3142N6JDezDgzuIg0_BtetA-npysh1sVrQAyTqk7CycQ1IeepFHXDvhXeZHzhbfkMslSfhPgVzuZCsevSct6ctgnsH54uTH_JSWNgvUM9yBtHHCMxa8YNoLr2VuMSV1E5SsndVBdzPtWFd1fOaEjQlybxYZ8jDWQTj0cZ_DaLlahhdAAvobspI2OZK8Cwzdo0ZXygaJitFGNYE3AznNVY-mYW5wkzPxDRLfZOKbagLHieI3MxMSdh5YrX-asrGMEpZbb5UMXcND9Lpy0Sq0i3gz47yrJ3A48MuU7bkxt8I0gbcDD29f__uXDv6_2ku4z7L0pBy0Qxht17_DKzRatm4Ke2rxcQrj9sOXz9-nRU5xdC7n03wR8Ac7sOq2
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwEB5V5cBDQlBAXWjBBziBxdqxHRupqpbHsqWlp1bqzdiJg8pht-wuQv1T_Y3MOEnbRYJbT5ESx7LGn-fheQG8lI2Qpm6oTapJHLlkw10ZE49VIa3Q0YY22uLQTI7VlxN9sgYXfS4MhVX2PDEz6npW0R3524KSMPF_K3fPfnLqGkXe1b6FRguL_XT-G022xc7eR9zfV1KOPx19mPCuqwCvJAKOl1FXUqZKS1fpypncUcm4MllTxOCSq4cuylrUahh1aKjC3LCRuOSmSDoKugBFln9LFSjJKTN9_LnHrxaU89KJ1x_ZqecKlSuBSrT3eIFg7_J2cvYeOR0lp36yKLdLxcWKbMwtBFb03r9ctVkCjh_A_U51ZaMWaw9hLU034N7omidiA-5eq2_4COJonlibvc6WvYaM-iabJ4oFwzGMot479OPcVF0zP3Js-oKdThlVm0rsdCU2cvcdG7E26eYxHN8I-Z_A-nQ2TZvAElo3RphAZquqk0RjrHTChmSQDYfGDuB1T05_1tbu8JdVmjPxPRLfZ-J7MYD3RPHLkVR3O7-Yzb_77hh7q4MKVbAm1aVKTeVEbIJFLUyVQ6XqYgBb_X75jhks_BV0B_Cm38Orz_9e0tP_z_YCbk-Ovh74g73D_WdwR2YkUfTbFqwv57_SNqpLy_g8Y5TBt5s-FH8AS-Qhmw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR1NaxQx9FG2ICoUrUq3rTUHPWnoJpPJTAQp24-ltbIUsdBbTGYSqYfdurtF_Gv-Ot_LzLRdQW89DcxkQnjfL-8L4LWMQuo60phUHThKychN4QP3VSZLkfvSNdkWY318rj5e5Bcr8LurhaG0yk4mJkFdTyu6I9_NqAgT_y_lbmzTIs4OR3tXPzhNkKJIazdOoyGR0_DrJ7pv8w8nh4jrN1KOjr4cHPN2wgCvJBIfL3xeSRmqXJoqr4xO05W0KUKpM-9MMPXAeFmLWg187iJ1mxtEicePWci9oMtQFP-rBXlFPVjdPxqffe6oORdUAdMq2-8pxGcylfqCSvT-eIak31bxpFo-CkFKTtNlUYsXioslTZkGCixZwX8FbpM-HD2BtdaQZcOG8p7CSpisw-PhnbjEOjy60-3wGfjhLLCmlp0tOnsZrU82C5QZhmsY5cC3vIB7U6_N9EiZ6nN2OWHUeyqwy6VMyb33bMiaEpzncH4vCHgBvcl0EjaABfR1tNCOnFhVB4muWWFE6YJGoexi2Ye3HTjtVdPJw970bE7Atwh8m4BvRR_2CeI3K6kLd3oxnX2zLVPbMnfKVa7UoS5UiJURProSbTJVDJSqsz5sd_iyrWiY21tC7sO7Doe3n_99pM3_7_YKHiBD2E8n49MteCgTIVEq3Db0FrPr8BJtp4XfaYmUwdf75os_3O4nLQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Are+vision+transformers+replacing+convolutional+neural+networks+in+scene+interpretation%3F%3A+A+review&rft.jtitle=SN+applied+sciences&rft.au=Rosy%2C+N.+Arockia&rft.au=Balasubadra%2C+K&rft.au=Deepa%2C+K&rft.date=2025-09-01&rft.pub=Springer+Nature+B.V&rft.issn=2523-3963&rft.eissn=2523-3971&rft.volume=7&rft.issue=9&rft.spage=932&rft_id=info:doi/10.1007%2Fs42452-025-07574-1&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=3004-9261&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=3004-9261&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=3004-9261&client=summon