Efficient Per-Shot Transformer-Based Bitrate Ladder Prediction for Adaptive Video Streaming

Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as t...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE International Conference on Image Processing (ICIP) pp. 1835 - 1839
Main Authors	Telili, Ahmed, Hamidouche, Wassim, Fezza, Sid Ahmed, Morin, Luce
Format	Conference Proceeding
Language	English
Published	IEEE 08.10.2023
Subjects	adaptive video streaming Bit rate Bitrate ladder Buildings Computational modeling Feature extraction HEVC Image coding Streaming media Transformers video compression vision transformer
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. In the past, a fixed bitrate ladder approach for all videos has been widely used. However, such a method does not consider video content, which can vary considerably in motion, texture, and scene complexity. Moreover, building a per-title bitrate ladder based on an exhaustive encoding is quite expensive due to the large encoding parameter space. Thus, alternative solutions allowing accurate and efficient per-title bitrate ladder prediction are in great demand. On the other hand, self-attention-based architectures have achieved tremendous performance in large language models (LLMs) and particularly vision transformers (ViTs) in computer vision tasks. Therefore, this paper investigates ViT's capabilities in building an efficient bitrate ladder without performing any encoding process. We provide the first in-depth analysis of the prediction accuracy and the complexity overhead induced by the ViTs model in predicting the bitrate ladder on a large and diverse video dataset. The source code of the proposed solution and the dataset will be made publicly available.
DOI:	10.1109/ICIP49359.2023.10222094