How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modaliti...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 34; pp. 3447 - 3462
Main Authors	Zhu, Yuxin, Duan, Huiyu, Zhang, Kaiwei, Zhu, Yucheng, Zhu, Xilei, Teng, Long, Min, Xiongkuo, Zhai, Guangtao
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Ambisonics Architecture Attention Audio data audio-visual Augmented reality Eye movements Head MONOS devices omnidirectional videos Prediction algorithms Predictive models Salience Saliency prediction Solid modeling Video Videos Virtual reality visual attention Visual databases Visualization
Online Access	Get full text
ISSN	1057-7149 1941-0042 1941-0042
DOI	10.1109/TIP.2025.3567842

Cover

Loading…

Abstract	Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS prediction tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV .
AbstractList	Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS prediction tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV. Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS predcition tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV.Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and comprehensive analyses. This paper comprehensively investigates audio-visual attention in ODVs from both subjective and objective perspectives. Specifically, we first introduce a new audio-visual saliency database for omnidirectional videos, termed AVS-ODV database, containing 162 ODVs and corresponding eye movement data collected from 60 subjects under three audio modes including mute, mono, and ambisonics. Based on the constructed AVS-ODV database, we perform an in-depth analysis of how audio influences visual attention in ODVs. To advance the research on audio-visual saliency prediction for ODVs, we further establish a new benchmark based on the AVS-ODV database by testing numerous state-of-the-art saliency models, including visual-only models and audio-visual models. In addition, given the limitations of current models, we propose an innovative omnidirectional audio-visual saliency prediction network (OmniAVS), which is built based on the U-Net architecture, and hierarchically fuses audio and visual features from the multimodal aligned embedding space. Extensive experimental results demonstrate that the proposed OmniAVS model outperforms other state-of-the-art models on both ODV AVS prediction and traditional AVS predcition tasks. The AVS-ODV database and the OmniAVS model are available at: https://github.com/IntMeGroup/AVS-ODV.
Author	Zhang, Kaiwei Teng, Long Min, Xiongkuo Zhu, Yuxin Zhu, Yucheng Zhu, Xilei Duan, Huiyu Zhai, Guangtao
Author_xml	– sequence: 1 givenname: Yuxin orcidid: 0009-0006-0542-578X surname: Zhu fullname: Zhu, Yuxin email: rye2000@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China – sequence: 2 givenname: Huiyu orcidid: 0000-0002-6519-4067 surname: Duan fullname: Duan, Huiyu email: huiyuduan@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China – sequence: 3 givenname: Kaiwei orcidid: 0000-0002-1620-736X surname: Zhang fullname: Zhang, Kaiwei email: zhangkaiwei@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China – sequence: 4 givenname: Yucheng orcidid: 0000-0002-3069-060X surname: Zhu fullname: Zhu, Yucheng email: zyc420@sjtu.edu.cn organization: USC-SJTU Institute of Cultural and Creative Industry, Shanghai Jiao Tong University, Shanghai, China – sequence: 5 givenname: Xilei orcidid: 0000-0001-6035-1353 surname: Zhu fullname: Zhu, Xilei email: xilei_zhu@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China – sequence: 6 givenname: Long orcidid: 0000-0001-5568-0849 surname: Teng fullname: Teng, Long email: tenglong@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China – sequence: 7 givenname: Xiongkuo orcidid: 0000-0001-5693-0416 surname: Min fullname: Min, Xiongkuo email: minxiongkuo@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China – sequence: 8 givenname: Guangtao orcidid: 0000-0001-8165-9322 surname: Zhai fullname: Zhai, Guangtao email: zhaiguangtao@sjtu.edu.cn organization: Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/40366826$$D View this record in MEDLINE/PubMed
BookMark	eNpdkc1r3DAQxUVJyce29x5CEOTSi7cjjW3Zp7AkTbOQkh7S9GhkaQQKXimxbEL_-2jZTQo9zTDzewNv3gk7CDEQY18ELIWA9tv9-tdSgqyWWNWqKeUHdizaUhQApTzIPVSqUKJsj9hJSo8AoqxEfciOSsC6bmR9zP7cxBd-FSnx1Wx95OvghpmCIf7g06wHvpomCpOPgfvA7zbBWz-S2Q7y8sFbiumCX-lJ9zoR18Hyn9HS8Il9dHpI9HlfF-z39ff7y5vi9u7H-nJ1WxgEnAqy2AhZWdJSIAqCmtAYdKpRrnbaiDavRN-oXvVWtg6dyT26llBKZS0u2Nfd3acxPs-Upm7jk6Fh0IHinDqUUKJUkB0v2Pl_6GOcx2xjSwmlVI2gMnW2p-Z-Q7Z7Gv1Gj3-7t5dlAHaAGWNKI7l3REC3TaXLqXTbVLp9KllyupN4IvqHCwAsRYOvTvCGvA
CODEN	IIPRE4
Cites_doi	10.1109/ICMEW.2018.8551543 10.1016/j.visres.2005.03.019 10.1109/TPAMI.2014.2366154 10.1109/34.730558 10.1109/JSTSP.2020.2966864 10.1109/CVPR.2011.5995676 10.1109/TMM.2023.3306596 10.5555/3001460.3001507 10.1109/ICIP.2018.8451338 10.1109/TPAMI.2012.59 10.1007/s12559-010-9074-z 10.1109/CVPR42600.2020.00886 10.1007/978-3-030-58558-7_25 10.1109/CVPR52729.2023.01457 10.1109/TIP.2012.2210727 10.1109/TIP.2017.2787612 10.1109/ISCAS46773.2023.10182000 10.1109/CVPR.2011.5995506 10.1109/CVPR52688.2022.01042 10.1007/978-1-4939-3435-5_16 10.1109/ICIP.2017.8296592 10.1109/TIP.2024.3461956 10.1145/3304109.3325818 10.1109/TVCG.2018.2793599 10.1145/3337066 10.1109/TPAMI.2018.2858783 10.1109/OJID.2024.3351089 10.1145/3240508.3240669 10.1109/CVPR.2019.01045 10.7551/mitpress/7503.003.0073 10.48550/arXiv.2203.16527 10.1007/978-3-319-10584-0_33 10.1109/ICMEW46912.2020.9105956 10.1109/ISCAS.2018.8351786 10.1109/VCIP49819.2020.9301766 10.1109/CVPR.2016.71 10.1109/ICCV.2009.5459462 10.1145/3503161.3547955 10.16910/jemr.6.4.1 10.1167/8.7.32 10.1016/j.image.2018.05.003 10.1109/ICIP42928.2021.9506089 10.1109/CVPR.2008.4587715 10.1145/3576857 10.1167/7.14.4 10.1109/ICCV.2015.38 10.1145/3304109.3325820 10.1109/CVPR52733.2024.02575 10.1109/ICIP46576.2022.9897737 10.48550/ARXIV.1706.03762 10.1007/978-3-031-46317-4_29 10.1109/CVPR.2018.00154 10.1109/CVPR42600.2020.00482 10.1109/ICCV.2019.00248 10.1109/TCSVT.2022.3172971 10.1109/ICPR.2016.7900174 10.1016/j.cag.2022.06.002 10.1109/VCIP.2015.7457921 10.1109/TIP.2022.3220404 10.1016/j.image.2015.08.004 10.1109/ICCV48922.2021.00676 10.1109/TMM.2019.2947352 10.1109/ICCVW.2017.275 10.48550/ARXIV.1406.1078 10.1109/TNNLS.2016.2522440 10.1109/CVPR.2007.383267 10.1007/978-3-030-01264-9_37 10.1109/TIP.2018.2851672 10.1007/978-3-319-24574-4_28 10.1109/TPAMI.2012.147 10.1007/s11432-024-4133-3 10.1109/TPAMI.2019.2924417 10.1167/14.8.5
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025
DBID	97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
DOI	10.1109/TIP.2025.3567842
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	PubMed Technology Research Database MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Engineering Architecture
EISSN	1941-0042
EndPage	3462
ExternalDocumentID	40366826 10_1109_TIP_2025_3567842 11003418
Genre	orig-research Journal Article
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 62401365; 62225112; 62271312; 62132006; U24A20220 funderid: 10.13039/501100001809 – fundername: Oceanic Interdisciplinary Program of Shanghai Jiao Tong University grantid: SL2020ZD102
GroupedDBID	--- -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 AAYOK AAYXX CITATION RIG NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-c303t-ed38125dea21331e06e3cc3f787f6fac19dea1b87b7bd29f3fc87b3f9e3227dd3
IEDL.DBID	RIE
ISSN	1057-7149 1941-0042
IngestDate	Wed Jul 02 04:11:09 EDT 2025 Tue Jul 22 15:12:19 EDT 2025 Mon Jul 21 06:04:16 EDT 2025 Thu Jul 03 08:44:26 EDT 2025 Wed Aug 27 01:42:16 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c303t-ed38125dea21331e06e3cc3f787f6fac19dea1b87b7bd29f3fc87b3f9e3227dd3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0001-5568-0849 0000-0001-6035-1353 0009-0006-0542-578X 0000-0002-3069-060X 0000-0002-1620-736X 0000-0002-6519-4067 0000-0001-5693-0416 0000-0001-8165-9322
PMID	40366826
PQID	3217776307
PQPubID	85429
PageCount	16
ParticipantIDs	pubmed_primary_40366826 proquest_miscellaneous_3204327040 proquest_journals_3217776307 ieee_primary_11003418 crossref_primary_10_1109_TIP_2025_3567842
PublicationCentury	2000
PublicationDate	2025-01-01
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– month: 01 year: 2025 text: 2025-01-01 day: 01
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: New York
PublicationTitle	IEEE transactions on image processing
PublicationTitleAbbrev	TIP
PublicationTitleAlternate	IEEE Trans Image Process
PublicationYear	2025
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref56 ref15 ref59 ref14 ref11 ref55 ref10 Zhang (ref35) Technologies (ref52) 2023 ref17 Linardos (ref74) ref16 ref19 ref51 ref45 ref89 ref47 ref42 ref86 ref41 ref85 ref43 ref87 Radford (ref88) Video (ref53) 2023 Zhu (ref3) 2023 ref8 ref7 ref9 ref4 ref6 ref5 ref82 ref81 Cerf (ref18) ref40 ref84 ref83 Tavakoli (ref31) 2019 ref80 Gutiérrez (ref57) 2018; 69 ref79 ref34 ref78 ref37 ref36 ref75 ref30 Dahou (ref44) ref33 ref77 ref32 ref76 Business (ref50) 2023 ref2 ref1 ref39 ref38 Archive (ref54) 2023 Pan (ref70) 2017 ref71 ref73 ref72 Robitza (ref48) 2022 ref24 ref68 ref23 ref67 ref26 ref25 ref69 ref20 ref64 ref63 ref22 ref66 ref21 ref65 ref28 ref27 Morgado (ref58); 31 ref29 (ref46) 2023 ref60 ref62 Giannakopoulos (ref49) 2014 ref61
References_xml	– ident: ref42 doi: 10.1109/ICMEW.2018.8551543 – ident: ref59 doi: 10.1016/j.visres.2005.03.019 – ident: ref12 doi: 10.1109/TPAMI.2014.2366154 – ident: ref15 doi: 10.1109/34.730558 – ident: ref5 doi: 10.1109/JSTSP.2020.2966864 – ident: ref21 doi: 10.1109/CVPR.2011.5995676 – ident: ref37 doi: 10.1109/TMM.2023.3306596 – ident: ref56 doi: 10.5555/3001460.3001507 – ident: ref13 doi: 10.1109/ICIP.2018.8451338 – ident: ref62 doi: 10.1109/TPAMI.2012.59 – ident: ref25 doi: 10.1007/s12559-010-9074-z – volume-title: Vive Pro Eye year: 2023 ident: ref50 – ident: ref85 doi: 10.1109/CVPR42600.2020.00886 – ident: ref6 doi: 10.1007/978-3-030-58558-7_25 – ident: ref40 doi: 10.1109/CVPR52729.2023.01457 – ident: ref66 doi: 10.1109/TIP.2012.2210727 – ident: ref81 doi: 10.1109/TIP.2017.2787612 – ident: ref7 doi: 10.1109/ISCAS46773.2023.10182000 – ident: ref22 doi: 10.1109/CVPR.2011.5995506 – ident: ref65 doi: 10.1109/CVPR52688.2022.01042 – ident: ref27 doi: 10.1007/978-1-4939-3435-5_16 – ident: ref47 doi: 10.1109/ICIP.2017.8296592 – ident: ref24 doi: 10.1109/TIP.2024.3461956 – volume-title: Introduction to Audio Analysis: A MATLAB® Approach year: 2014 ident: ref49 – ident: ref14 doi: 10.1145/3304109.3325818 – ident: ref4 doi: 10.1109/TVCG.2018.2793599 – ident: ref8 doi: 10.1145/3337066 – ident: ref34 doi: 10.1109/TPAMI.2018.2858783 – ident: ref33 doi: 10.1109/OJID.2024.3351089 – ident: ref80 doi: 10.1145/3240508.3240669 – ident: ref82 doi: 10.1109/CVPR.2019.01045 – ident: ref16 doi: 10.7551/mitpress/7503.003.0073 – ident: ref89 doi: 10.48550/arXiv.2203.16527 – volume-title: Media Player year: 2023 ident: ref53 – ident: ref29 doi: 10.1007/978-3-319-10584-0_33 – ident: ref38 doi: 10.1109/ICMEW46912.2020.9105956 – volume-title: Facebook-360-Spatial-Workstation year: 2023 ident: ref54 – ident: ref2 doi: 10.1109/ISCAS.2018.8351786 – ident: ref39 doi: 10.1109/VCIP49819.2020.9301766 – ident: ref78 doi: 10.1109/CVPR.2016.71 – ident: ref10 doi: 10.1109/ICCV.2009.5459462 – ident: ref72 doi: 10.1145/3503161.3547955 – ident: ref60 doi: 10.16910/jemr.6.4.1 – ident: ref19 doi: 10.1167/8.7.32 – volume: 69 start-page: 35 year: 2018 ident: ref57 article-title: Toolbox and dataset for the development of saliency and scanpath models for omnidirectional/360° still images publication-title: Signal Process., Image Commun. doi: 10.1016/j.image.2018.05.003 – ident: ref86 doi: 10.1109/ICIP42928.2021.9506089 – ident: ref20 doi: 10.1109/CVPR.2008.4587715 – start-page: 182 volume-title: Proc. Brit. Mach. Vis. Conf. (BMVC) ident: ref74 article-title: Simple vs complex temporal recurrences for video saliency prediction – ident: ref87 doi: 10.1145/3576857 – ident: ref67 doi: 10.1167/7.14.4 – ident: ref68 doi: 10.1109/ICCV.2015.38 – ident: ref55 doi: 10.1145/3304109.3325820 – ident: ref75 doi: 10.1109/CVPR52733.2024.02575 – start-page: 241 volume-title: Proc. NIPS ident: ref18 article-title: Predicting human gaze using low-level saliency combined with face detection – volume-title: Unity year: 2023 ident: ref52 – ident: ref36 doi: 10.1109/ICIP46576.2022.9897737 – ident: ref45 doi: 10.48550/ARXIV.1706.03762 – ident: ref41 doi: 10.1007/978-3-031-46317-4_29 – ident: ref43 doi: 10.1109/CVPR.2018.00154 – ident: ref32 doi: 10.1109/CVPR42600.2020.00482 – ident: ref73 doi: 10.1109/ICCV.2019.00248 – ident: ref9 doi: 10.1109/TCSVT.2022.3172971 – ident: ref69 doi: 10.1109/ICPR.2016.7900174 – ident: ref76 doi: 10.1016/j.cag.2022.06.002 – volume: 31 start-page: 360 volume-title: Proc. Neural Inf. Process. Syst. (NeurIPS) ident: ref58 article-title: Self-supervised generation of spatial audio for 360° video – year: 2017 ident: ref70 article-title: SalGAN: Visual saliency prediction with generative adversarial networks publication-title: arXiv:1701.01081 – start-page: 8748 volume-title: Proc. Int. Conf. Mach. Learn. (ICML) ident: ref88 article-title: Learning transferable visual models from natural language supervision – ident: ref28 doi: 10.1109/VCIP.2015.7457921 – year: 2023 ident: ref3 article-title: Perceptual quality assessment of omnidirectional audio-visual signals publication-title: arXiv:2307.10813 – volume-title: Siti-Tools year: 2022 ident: ref48 – ident: ref51 doi: 10.1109/TIP.2022.3220404 – ident: ref30 doi: 10.1016/j.image.2015.08.004 – start-page: 488 volume-title: Proc. Eur. Conf. Comput. Vis. (ECCV) ident: ref35 article-title: Saliency detection in 360° videos – ident: ref64 doi: 10.1109/ICCV48922.2021.00676 – ident: ref83 doi: 10.1109/TMM.2019.2947352 – ident: ref79 doi: 10.1109/ICCVW.2017.275 – ident: ref63 doi: 10.48550/ARXIV.1406.1078 – year: 2019 ident: ref31 article-title: DAVE: A deep audio-visual embedding for dynamic saliency prediction publication-title: arXiv:1905.10693 – ident: ref77 doi: 10.1109/TNNLS.2016.2522440 – start-page: 305 volume-title: Proc. Pattern Recognit. ICPR Int. Workshops Challenges ident: ref44 article-title: ATSal: An attention based architecture for saliency prediction in 360° videos – ident: ref17 doi: 10.1109/CVPR.2007.383267 – volume-title: Insta360. Insta360 Pro 2 year: 2023 ident: ref46 – ident: ref84 doi: 10.1007/978-3-030-01264-9_37 – ident: ref71 doi: 10.1109/TIP.2018.2851672 – ident: ref61 doi: 10.1007/978-3-319-24574-4_28 – ident: ref23 doi: 10.1109/TPAMI.2012.147 – ident: ref1 doi: 10.1007/s11432-024-4133-3 – ident: ref11 doi: 10.1109/TPAMI.2019.2924417 – ident: ref26 doi: 10.1167/14.8.5
SSID	ssj0014516
Score	2.4649367
Snippet	Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Publisher
StartPage	3447
SubjectTerms	Ambisonics Architecture Attention Audio data audio-visual Augmented reality Eye movements Head MONOS devices omnidirectional videos Prediction algorithms Predictive models Salience Saliency prediction Solid modeling Video Videos Virtual reality visual attention Visual databases Visualization
Title	How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
URI	https://ieeexplore.ieee.org/document/11003418 https://www.ncbi.nlm.nih.gov/pubmed/40366826 https://www.proquest.com/docview/3217776307 https://www.proquest.com/docview/3204327040
Volume	34
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Nb9UwDLfYTnBgMAYUBgoSFw59a5ukaU5oYkxvSAwOG-xWNV_S06BFtBUSfz122j4mpEncIiVK09jxRxz7B_C61FKboEIquBOpKEyeaicJNdVam-eOZ5YShT-el-tL8eFKXs3J6jEXxnsfH5_5FTVjLN91dqSrsiMqb4ZSt9qBHfTcpmStbciAEGdjaFOqVKHdv8QkM310cfYZPcFCrrhE2SwIwUag5C4rKqlwQx1FfJXbTc2ock734HxZ7PTS5Ho1DmZlf_9Tx_G__-YB3J-NT3Y8cctDuOPbfdibDVE2H_N-H-7dqFL4CL6uu1_spPM9Ox7dpmNnC64J-7LpR5pvGKZHk2zTsk_f282kJ-MlI45xvuvfspNmaEhlsqZ1jBDYvh3A5en7i3frdMZjSC0quiH1DtV7IZ1vCvRsc5-VnlvLA575UIbG5hq7clMpo4wrdODBYpsH7VFqKOf4Y9htu9Y_BWaCdIU3UhBGTtW4BlmjklqE4HIRrE3gzUKW-sdUdqOO7kqma6RmTdSsZ2omcECb-3fcvK8JHC6ErOeD2dccXTCFMjVTCbzaduORojhJ0_pupDFUp1CheEvgycQA28kXvnl2y0efw11a23RJcwi7w8_Rv0CzZTAvI7v-AdwD5rw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5BOQAHCqWF0AJG4sIh2yS24_iEKkq1C-3CYQu9RfFLWgFJ1SRC4tczdpJthVSJmyVbjuN52uOZD-BtLrlUTriYUcNilqk0loZ71FStdZoammifKHy2zOfn7NMFvxiT1UMujLU2PD6zM98MsXzT6N5flR368maodYu7cA8NP0-HdK1N0MBjzobgJhexQM9_ikom8nC1-IpnwYzPKEftzDyGDUPdnRe-qMINgxQQVm53NoPROdmG5bTc4a3Jj1nfqZn-808lx__-n8fwaHQ_ydHAL0_gjq13YHt0Rcko6O0OPLxRp_ApfJ83v8lxY1ty1Jt1QxYTsgn5tm57P1_XDc8mybomX37V68FShmtGHGNs074nx1VXeaNJqtoQj8H2cxfOTz6uPszjEZEh1mjqutgaNPAZN7bK8Gyb2iS3VGvqUOpd7iqdSuxKVSGUUCaTjjqNbeqkRb0hjKF7sFU3tX0ORDluMqs48yg5RWUqZI6CS-acSZnTOoJ3E1nKy6HwRhkOLIkskZqlp2Y5UjOCXb-51-PGfY3gYCJkOYpmW1I8hAnUqomI4M2mG4XKR0qq2ja9H-MrFQpUcBE8GxhgM_nENy9u-ehruD9fnZ2Wp4vl53144Nc5XNkcwFZ31duX6MR06lVg3b8OUeoF
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+Does+Audio+Influence+Visual+Attention+in+Omnidirectional+Videos%3F+Database+and+Model&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Zhu%2C+Yuxin&rft.au=Duan%2C+Huiyu&rft.au=Zhang%2C+Kaiwei&rft.au=Zhu%2C+Yucheng&rft.date=2025-01-01&rft.issn=1941-0042&rft.eissn=1941-0042&rft.volume=PP&rft_id=info:doi/10.1109%2FTIP.2025.3567842&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon