How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modaliti...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 34; pp. 3447 - 3462
Main Authors Zhu, Yuxin, Duan, Huiyu, Zhang, Kaiwei, Zhu, Yucheng, Zhu, Xilei, Teng, Long, Min, Xiongkuo, Zhai, Guangtao
Format Journal Article
LanguageEnglish
Published United States IEEE 01.01.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1057-7149
1941-0042
1941-0042
DOI10.1109/TIP.2025.3567842

Cover

Loading…
Abstract Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS prediction tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV .
AbstractList Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS prediction tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV.
Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS predcition tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV.Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS predcition tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV.
Author Zhang, Kaiwei
Teng, Long
Min, Xiongkuo
Zhu, Yuxin
Zhu, Yucheng
Zhu, Xilei
Duan, Huiyu
Zhai, Guangtao
Author_xml – sequence: 1
  givenname: Yuxin
  orcidid: 0009-0006-0542-578X
  surname: Zhu
  fullname: Zhu, Yuxin
  email: rye2000@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
– sequence: 2
  givenname: Huiyu
  orcidid: 0000-0002-6519-4067
  surname: Duan
  fullname: Duan, Huiyu
  email: huiyuduan@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
– sequence: 3
  givenname: Kaiwei
  orcidid: 0000-0002-1620-736X
  surname: Zhang
  fullname: Zhang, Kaiwei
  email: zhangkaiwei@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
– sequence: 4
  givenname: Yucheng
  orcidid: 0000-0002-3069-060X
  surname: Zhu
  fullname: Zhu, Yucheng
  email: zyc420@sjtu.edu.cn
  organization: USC-SJTU Institute of Cultural and Creative Industry, Shanghai Jiao Tong University, Shanghai, China
– sequence: 5
  givenname: Xilei
  orcidid: 0000-0001-6035-1353
  surname: Zhu
  fullname: Zhu, Xilei
  email: xilei_zhu@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
– sequence: 6
  givenname: Long
  orcidid: 0000-0001-5568-0849
  surname: Teng
  fullname: Teng, Long
  email: tenglong@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
– sequence: 7
  givenname: Xiongkuo
  orcidid: 0000-0001-5693-0416
  surname: Min
  fullname: Min, Xiongkuo
  email: minxiongkuo@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
– sequence: 8
  givenname: Guangtao
  orcidid: 0000-0001-8165-9322
  surname: Zhai
  fullname: Zhai, Guangtao
  email: zhaiguangtao@sjtu.edu.cn
  organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40366826$$D View this record in MEDLINE/PubMed
BookMark eNpdkc1r3DAQxUVJyce29x5CEOTSi7cjjW3Zp7AkTbOQkh7S9GhkaQQKXimxbEL_-2jZTQo9zTDzewNv3gk7CDEQY18ELIWA9tv9-tdSgqyWWNWqKeUHdizaUhQApTzIPVSqUKJsj9hJSo8AoqxEfciOSsC6bmR9zP7cxBd-FSnx1Wx95OvghpmCIf7g06wHvpomCpOPgfvA7zbBWz-S2Q7y8sFbiumCX-lJ9zoR18Hyn9HS8Il9dHpI9HlfF-z39ff7y5vi9u7H-nJ1WxgEnAqy2AhZWdJSIAqCmtAYdKpRrnbaiDavRN-oXvVWtg6dyT26llBKZS0u2Nfd3acxPs-Upm7jk6Fh0IHinDqUUKJUkB0v2Pl_6GOcx2xjSwmlVI2gMnW2p-Z-Q7Z7Gv1Gj3-7t5dlAHaAGWNKI7l3REC3TaXLqXTbVLp9KllyupN4IvqHCwAsRYOvTvCGvA
CODEN IIPRE4
Cites_doi 10.1109/ICMEW.2018.8551543
10.1016/j.visres.2005.03.019
10.1109/TPAMI.2014.2366154
10.1109/34.730558
10.1109/JSTSP.2020.2966864
10.1109/CVPR.2011.5995676
10.1109/TMM.2023.3306596
10.5555/3001460.3001507
10.1109/ICIP.2018.8451338
10.1109/TPAMI.2012.59
10.1007/s12559-010-9074-z
10.1109/CVPR42600.2020.00886
10.1007/978-3-030-58558-7_25
10.1109/CVPR52729.2023.01457
10.1109/TIP.2012.2210727
10.1109/TIP.2017.2787612
10.1109/ISCAS46773.2023.10182000
10.1109/CVPR.2011.5995506
10.1109/CVPR52688.2022.01042
10.1007/978-1-4939-3435-5_16
10.1109/ICIP.2017.8296592
10.1109/TIP.2024.3461956
10.1145/3304109.3325818
10.1109/TVCG.2018.2793599
10.1145/3337066
10.1109/TPAMI.2018.2858783
10.1109/OJID.2024.3351089
10.1145/3240508.3240669
10.1109/CVPR.2019.01045
10.7551/mitpress/7503.003.0073
10.48550/arXiv.2203.16527
10.1007/978-3-319-10584-0_33
10.1109/ICMEW46912.2020.9105956
10.1109/ISCAS.2018.8351786
10.1109/VCIP49819.2020.9301766
10.1109/CVPR.2016.71
10.1109/ICCV.2009.5459462
10.1145/3503161.3547955
10.16910/jemr.6.4.1
10.1167/8.7.32
10.1016/j.image.2018.05.003
10.1109/ICIP42928.2021.9506089
10.1109/CVPR.2008.4587715
10.1145/3576857
10.1167/7.14.4
10.1109/ICCV.2015.38
10.1145/3304109.3325820
10.1109/CVPR52733.2024.02575
10.1109/ICIP46576.2022.9897737
10.48550/ARXIV.1706.03762
10.1007/978-3-031-46317-4_29
10.1109/CVPR.2018.00154
10.1109/CVPR42600.2020.00482
10.1109/ICCV.2019.00248
10.1109/TCSVT.2022.3172971
10.1109/ICPR.2016.7900174
10.1016/j.cag.2022.06.002
10.1109/VCIP.2015.7457921
10.1109/TIP.2022.3220404
10.1016/j.image.2015.08.004
10.1109/ICCV48922.2021.00676
10.1109/TMM.2019.2947352
10.1109/ICCVW.2017.275
10.48550/ARXIV.1406.1078
10.1109/TNNLS.2016.2522440
10.1109/CVPR.2007.383267
10.1007/978-3-030-01264-9_37
10.1109/TIP.2018.2851672
10.1007/978-3-319-24574-4_28
10.1109/TPAMI.2012.147
10.1007/s11432-024-4133-3
10.1109/TPAMI.2019.2924417
10.1167/14.8.5
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TIP.2025.3567842
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList PubMed
Technology Research Database
MEDLINE - Academic

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
Architecture
EISSN 1941-0042
EndPage 3462
ExternalDocumentID 40366826
10_1109_TIP_2025_3567842
11003418
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62401365; 62225112; 62271312; 62132006; U24A20220
  funderid: 10.13039/501100001809
– fundername: Oceanic Interdisciplinary Program of Shanghai Jiao Tong University
  grantid: SL2020ZD102
GroupedDBID ---
-~X
.DC
0R~
29I
4.4
53G
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
AAYOK
AAYXX
CITATION
RIG
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c303t-ed38125dea21331e06e3cc3f787f6fac19dea1b87b7bd29f3fc87b3f9e3227dd3
IEDL.DBID RIE
ISSN 1057-7149
1941-0042
IngestDate Wed Jul 02 04:11:09 EDT 2025
Tue Jul 22 15:12:19 EDT 2025
Mon Jul 21 06:04:16 EDT 2025
Thu Jul 03 08:44:26 EDT 2025
Wed Aug 27 01:42:16 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c303t-ed38125dea21331e06e3cc3f787f6fac19dea1b87b7bd29f3fc87b3f9e3227dd3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0001-5568-0849
0000-0001-6035-1353
0009-0006-0542-578X
0000-0002-3069-060X
0000-0002-1620-736X
0000-0002-6519-4067
0000-0001-5693-0416
0000-0001-8165-9322
PMID 40366826
PQID 3217776307
PQPubID 85429
PageCount 16
ParticipantIDs pubmed_primary_40366826
proquest_miscellaneous_3204327040
proquest_journals_3217776307
ieee_primary_11003418
crossref_primary_10_1109_TIP_2025_3567842
PublicationCentury 2000
PublicationDate 2025-01-01
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on image processing
PublicationTitleAbbrev TIP
PublicationTitleAlternate IEEE Trans Image Process
PublicationYear 2025
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref56
ref15
ref59
ref14
ref11
ref55
ref10
Zhang (ref35)
Technologies (ref52) 2023
ref17
Linardos (ref74)
ref16
ref19
ref51
ref45
ref89
ref47
ref42
ref86
ref41
ref85
ref43
ref87
Radford (ref88)
Video (ref53) 2023
Zhu (ref3) 2023
ref8
ref7
ref9
ref4
ref6
ref5
ref82
ref81
Cerf (ref18)
ref40
ref84
ref83
Tavakoli (ref31) 2019
ref80
Gutiérrez (ref57) 2018; 69
ref79
ref34
ref78
ref37
ref36
ref75
ref30
Dahou (ref44)
ref33
ref77
ref32
ref76
Business (ref50) 2023
ref2
ref1
ref39
ref38
Archive (ref54) 2023
Pan (ref70) 2017
ref71
ref73
ref72
Robitza (ref48) 2022
ref24
ref68
ref23
ref67
ref26
ref25
ref69
ref20
ref64
ref63
ref22
ref66
ref21
ref65
ref28
ref27
Morgado (ref58); 31
ref29
(ref46) 2023
ref60
ref62
Giannakopoulos (ref49) 2014
ref61
References_xml – ident: ref42
  doi: 10.1109/ICMEW.2018.8551543
– ident: ref59
  doi: 10.1016/j.visres.2005.03.019
– ident: ref12
  doi: 10.1109/TPAMI.2014.2366154
– ident: ref15
  doi: 10.1109/34.730558
– ident: ref5
  doi: 10.1109/JSTSP.2020.2966864
– ident: ref21
  doi: 10.1109/CVPR.2011.5995676
– ident: ref37
  doi: 10.1109/TMM.2023.3306596
– ident: ref56
  doi: 10.5555/3001460.3001507
– ident: ref13
  doi: 10.1109/ICIP.2018.8451338
– ident: ref62
  doi: 10.1109/TPAMI.2012.59
– ident: ref25
  doi: 10.1007/s12559-010-9074-z
– volume-title: Vive Pro Eye
  year: 2023
  ident: ref50
– ident: ref85
  doi: 10.1109/CVPR42600.2020.00886
– ident: ref6
  doi: 10.1007/978-3-030-58558-7_25
– ident: ref40
  doi: 10.1109/CVPR52729.2023.01457
– ident: ref66
  doi: 10.1109/TIP.2012.2210727
– ident: ref81
  doi: 10.1109/TIP.2017.2787612
– ident: ref7
  doi: 10.1109/ISCAS46773.2023.10182000
– ident: ref22
  doi: 10.1109/CVPR.2011.5995506
– ident: ref65
  doi: 10.1109/CVPR52688.2022.01042
– ident: ref27
  doi: 10.1007/978-1-4939-3435-5_16
– ident: ref47
  doi: 10.1109/ICIP.2017.8296592
– ident: ref24
  doi: 10.1109/TIP.2024.3461956
– volume-title: Introduction to Audio Analysis: A MATLAB® Approach
  year: 2014
  ident: ref49
– ident: ref14
  doi: 10.1145/3304109.3325818
– ident: ref4
  doi: 10.1109/TVCG.2018.2793599
– ident: ref8
  doi: 10.1145/3337066
– ident: ref34
  doi: 10.1109/TPAMI.2018.2858783
– ident: ref33
  doi: 10.1109/OJID.2024.3351089
– ident: ref80
  doi: 10.1145/3240508.3240669
– ident: ref82
  doi: 10.1109/CVPR.2019.01045
– ident: ref16
  doi: 10.7551/mitpress/7503.003.0073
– ident: ref89
  doi: 10.48550/arXiv.2203.16527
– volume-title: Media Player
  year: 2023
  ident: ref53
– ident: ref29
  doi: 10.1007/978-3-319-10584-0_33
– ident: ref38
  doi: 10.1109/ICMEW46912.2020.9105956
– volume-title: Facebook-360-Spatial-Workstation
  year: 2023
  ident: ref54
– ident: ref2
  doi: 10.1109/ISCAS.2018.8351786
– ident: ref39
  doi: 10.1109/VCIP49819.2020.9301766
– ident: ref78
  doi: 10.1109/CVPR.2016.71
– ident: ref10
  doi: 10.1109/ICCV.2009.5459462
– ident: ref72
  doi: 10.1145/3503161.3547955
– ident: ref60
  doi: 10.16910/jemr.6.4.1
– ident: ref19
  doi: 10.1167/8.7.32
– volume: 69
  start-page: 35
  year: 2018
  ident: ref57
  article-title: Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360° still images
  publication-title: Signal Process., Image Commun.
  doi: 10.1016/j.image.2018.05.003
– ident: ref86
  doi: 10.1109/ICIP42928.2021.9506089
– ident: ref20
  doi: 10.1109/CVPR.2008.4587715
– start-page: 182
  volume-title: Proc. Brit. Mach. Vis. Conf. (BMVC)
  ident: ref74
  article-title: Simple vs complex temporal recurrences for video saliency prediction
– ident: ref87
  doi: 10.1145/3576857
– ident: ref67
  doi: 10.1167/7.14.4
– ident: ref68
  doi: 10.1109/ICCV.2015.38
– ident: ref55
  doi: 10.1145/3304109.3325820
– ident: ref75
  doi: 10.1109/CVPR52733.2024.02575
– start-page: 241
  volume-title: Proc. NIPS
  ident: ref18
  article-title: Predicting human gaze using low-level saliency combined with face detection
– volume-title: Unity
  year: 2023
  ident: ref52
– ident: ref36
  doi: 10.1109/ICIP46576.2022.9897737
– ident: ref45
  doi: 10.48550/ARXIV.1706.03762
– ident: ref41
  doi: 10.1007/978-3-031-46317-4_29
– ident: ref43
  doi: 10.1109/CVPR.2018.00154
– ident: ref32
  doi: 10.1109/CVPR42600.2020.00482
– ident: ref73
  doi: 10.1109/ICCV.2019.00248
– ident: ref9
  doi: 10.1109/TCSVT.2022.3172971
– ident: ref69
  doi: 10.1109/ICPR.2016.7900174
– ident: ref76
  doi: 10.1016/j.cag.2022.06.002
– volume: 31
  start-page: 360
  volume-title: Proc. Neural Inf. Process. Syst. (NeurIPS)
  ident: ref58
  article-title: Self-supervised generation of spatial audio for 360° video
– year: 2017
  ident: ref70
  article-title: SalGAN: Visual saliency prediction with generative adversarial networks
  publication-title: arXiv:1701.01081
– start-page: 8748
  volume-title: Proc. Int. Conf. Mach. Learn. (ICML)
  ident: ref88
  article-title: Learning transferable visual models from natural language supervision
– ident: ref28
  doi: 10.1109/VCIP.2015.7457921
– year: 2023
  ident: ref3
  article-title: Perceptual quality assessment of omnidirectional audio-visual signals
  publication-title: arXiv:2307.10813
– volume-title: Siti-Tools
  year: 2022
  ident: ref48
– ident: ref51
  doi: 10.1109/TIP.2022.3220404
– ident: ref30
  doi: 10.1016/j.image.2015.08.004
– start-page: 488
  volume-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
  ident: ref35
  article-title: Saliency detection in 360° videos
– ident: ref64
  doi: 10.1109/ICCV48922.2021.00676
– ident: ref83
  doi: 10.1109/TMM.2019.2947352
– ident: ref79
  doi: 10.1109/ICCVW.2017.275
– ident: ref63
  doi: 10.48550/ARXIV.1406.1078
– year: 2019
  ident: ref31
  article-title: DAVE: A deep audio-visual embedding for dynamic saliency prediction
  publication-title: arXiv:1905.10693
– ident: ref77
  doi: 10.1109/TNNLS.2016.2522440
– start-page: 305
  volume-title: Proc. Pattern Recognit. ICPR Int. Workshops Challenges
  ident: ref44
  article-title: ATSal: An attention based architecture for saliency prediction in 360° videos
– ident: ref17
  doi: 10.1109/CVPR.2007.383267
– volume-title: Insta360. Insta360 Pro 2
  year: 2023
  ident: ref46
– ident: ref84
  doi: 10.1007/978-3-030-01264-9_37
– ident: ref71
  doi: 10.1109/TIP.2018.2851672
– ident: ref61
  doi: 10.1007/978-3-319-24574-4_28
– ident: ref23
  doi: 10.1109/TPAMI.2012.147
– ident: ref1
  doi: 10.1007/s11432-024-4133-3
– ident: ref11
  doi: 10.1109/TPAMI.2019.2924417
– ident: ref26
  doi: 10.1167/14.8.5
SSID ssj0014516
Score 2.4649367
Snippet Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Publisher
StartPage 3447
SubjectTerms Ambisonics
Architecture
Attention
Audio data
audio-visual
Augmented reality
Eye movements
Head
MONOS devices
omnidirectional videos
Prediction algorithms
Predictive models
Salience
Saliency prediction
Solid modeling
Video
Videos
Virtual reality
visual attention
Visual databases
Visualization
Title How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
URI https://ieeexplore.ieee.org/document/11003418
https://www.ncbi.nlm.nih.gov/pubmed/40366826
https://www.proquest.com/docview/3217776307
https://www.proquest.com/docview/3204327040
Volume 34
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Nb9UwDLfYTnBgMAYUBgoSFw59a5ukaU5oYkxvSAwOG-xWNV_S06BFtBUSfz122j4mpEncIiVK09jxRxz7B_C61FKboEIquBOpKEyeaicJNdVam-eOZ5YShT-el-tL8eFKXs3J6jEXxnsfH5_5FTVjLN91dqSrsiMqb4ZSt9qBHfTcpmStbciAEGdjaFOqVKHdv8QkM310cfYZPcFCrrhE2SwIwUag5C4rKqlwQx1FfJXbTc2ock734HxZ7PTS5Ho1DmZlf_9Tx_G__-YB3J-NT3Y8cctDuOPbfdibDVE2H_N-H-7dqFL4CL6uu1_spPM9Ox7dpmNnC64J-7LpR5pvGKZHk2zTsk_f282kJ-MlI45xvuvfspNmaEhlsqZ1jBDYvh3A5en7i3frdMZjSC0quiH1DtV7IZ1vCvRsc5-VnlvLA575UIbG5hq7clMpo4wrdODBYpsH7VFqKOf4Y9htu9Y_BWaCdIU3UhBGTtW4BlmjklqE4HIRrE3gzUKW-sdUdqOO7kqma6RmTdSsZ2omcECb-3fcvK8JHC6ErOeD2dccXTCFMjVTCbzaduORojhJ0_pupDFUp1CheEvgycQA28kXvnl2y0efw11a23RJcwi7w8_Rv0CzZTAvI7v-AdwD5rw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5BOQAHCqWF0AJG4sIh2yS24_iEKkq1C-3CYQu9RfFLWgFJ1SRC4tczdpJthVSJmyVbjuN52uOZD-BtLrlUTriYUcNilqk0loZ71FStdZoammifKHy2zOfn7NMFvxiT1UMujLU2PD6zM98MsXzT6N5flR368maodYu7cA8NP0-HdK1N0MBjzobgJhexQM9_ikom8nC1-IpnwYzPKEftzDyGDUPdnRe-qMINgxQQVm53NoPROdmG5bTc4a3Jj1nfqZn-808lx__-n8fwaHQ_ydHAL0_gjq13YHt0Rcko6O0OPLxRp_ApfJ83v8lxY1ty1Jt1QxYTsgn5tm57P1_XDc8mybomX37V68FShmtGHGNs074nx1VXeaNJqtoQj8H2cxfOTz6uPszjEZEh1mjqutgaNPAZN7bK8Gyb2iS3VGvqUOpd7iqdSuxKVSGUUCaTjjqNbeqkRb0hjKF7sFU3tX0ORDluMqs48yg5RWUqZI6CS-acSZnTOoJ3E1nKy6HwRhkOLIkskZqlp2Y5UjOCXb-51-PGfY3gYCJkOYpmW1I8hAnUqomI4M2mG4XKR0qq2ja9H-MrFQpUcBE8GxhgM_nENy9u-ehruD9fnZ2Wp4vl53144Nc5XNkcwFZ31duX6MR06lVg3b8OUeoF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+Does+Audio+Influence+Visual+Attention+in+Omnidirectional+Videos%3F+Database+and+Model&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Zhu%2C+Yuxin&rft.au=Duan%2C+Huiyu&rft.au=Zhang%2C+Kaiwei&rft.au=Zhu%2C+Yucheng&rft.date=2025-01-01&rft.issn=1941-0042&rft.eissn=1941-0042&rft.volume=PP&rft_id=info:doi/10.1109%2FTIP.2025.3567842&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon