Proposal With Alignment: A Bi-Directional Transformer for 360° Video Viewport Proposal

People normally watch 360 ° videos through a head-mounted display, inside which only the content of viewports can be seen. Therefore, viewport proposal, referring to detecting potential viewport candidates, plays an important role in many 360 ° video processing tasks. In this paper, we advance the v...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 34; no. 11; pp. 11423 - 11437
Main Authors Guo, Yichen, Xu, Mai, Jiang, Lai, Deng, Xin, Zhou, Jing, Chen, Gaoxing, Sigal, Leonid
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:People normally watch 360 ° videos through a head-mounted display, inside which only the content of viewports can be seen. Therefore, viewport proposal, referring to detecting potential viewport candidates, plays an important role in many 360 ° video processing tasks. In this paper, we advance the viewport proposal by further aligning the predicted viewports across frames for individual subject. This provides a better methodology and a deeper perspective to learn the human perceptual behaviours on 360 ° videos. Specifically, we first analyze three 360 ° video datasets and obtain several findings on human consistency, objectness and motion of viewports. Inspired by these findings, we propose a bi-directional transformer approach, named BiT, for 360 ° video viewport proposal and alignment. Specifically, BiT is composed of a multi-level residual module, a bi-directional encoder-decoder module and a spherical matching module. This way, the viewports can be well proposed and aligned via considering multi-level, bi-directional and non-local information. Moreover, the aligned viewports by BiT are used to refine the viewports and improve viewport proposal accuracy in return. Finally, we validate that our BiT approach is superior on viewport proposal, compared with the state-of-the-art approaches. Besides, the aligned viewports from BiT is verified to be effective in multiple applications, such as saliency prediction, trajectory prediction and perceptual video compression.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2024.3419910