Efficient Per-Shot Transformer-Based Bitrate Ladder Prediction for Adaptive Video Streaming

Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as t...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Image Processing (ICIP) pp. 1835 - 1839
Main Authors Telili, Ahmed, Hamidouche, Wassim, Fezza, Sid Ahmed, Morin, Luce
Format Conference Proceeding
LanguageEnglish
Published IEEE 08.10.2023
Subjects
Online AccessGet full text
DOI10.1109/ICIP49359.2023.10222094

Cover

Loading…
Abstract Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. In the past, a fixed bitrate ladder approach for all videos has been widely used. However, such a method does not consider video content, which can vary considerably in motion, texture, and scene complexity. Moreover, building a per-title bitrate ladder based on an exhaustive encoding is quite expensive due to the large encoding parameter space. Thus, alternative solutions allowing accurate and efficient per-title bitrate ladder prediction are in great demand. On the other hand, self-attention-based architectures have achieved tremendous performance in large language models (LLMs) and particularly vision transformers (ViTs) in computer vision tasks. Therefore, this paper investigates ViT's capabilities in building an efficient bitrate ladder without performing any encoding process. We provide the first in-depth analysis of the prediction accuracy and the complexity overhead induced by the ViTs model in predicting the bitrate ladder on a large and diverse video dataset. The source code of the proposed solution and the dataset will be made publicly available.
AbstractList Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide smooth streaming. In HAS, stream representations are encoded to target a specific bitrate providing a wide range of operating bitrates known as the bitrate ladder. In the past, a fixed bitrate ladder approach for all videos has been widely used. However, such a method does not consider video content, which can vary considerably in motion, texture, and scene complexity. Moreover, building a per-title bitrate ladder based on an exhaustive encoding is quite expensive due to the large encoding parameter space. Thus, alternative solutions allowing accurate and efficient per-title bitrate ladder prediction are in great demand. On the other hand, self-attention-based architectures have achieved tremendous performance in large language models (LLMs) and particularly vision transformers (ViTs) in computer vision tasks. Therefore, this paper investigates ViT's capabilities in building an efficient bitrate ladder without performing any encoding process. We provide the first in-depth analysis of the prediction accuracy and the complexity overhead induced by the ViTs model in predicting the bitrate ladder on a large and diverse video dataset. The source code of the proposed solution and the dataset will be made publicly available.
Author Fezza, Sid Ahmed
Hamidouche, Wassim
Morin, Luce
Telili, Ahmed
Author_xml – sequence: 1
  givenname: Ahmed
  surname: Telili
  fullname: Telili, Ahmed
  organization: Univ. Rennes, INSA Rennes, CNRS, IETR - UMR,Rennes,France,6164
– sequence: 2
  givenname: Wassim
  surname: Hamidouche
  fullname: Hamidouche, Wassim
  organization: Univ. Rennes, INSA Rennes, CNRS, IETR - UMR,Rennes,France,6164
– sequence: 3
  givenname: Sid Ahmed
  surname: Fezza
  fullname: Fezza, Sid Ahmed
  organization: National Higher School of Telecommunications and ICT,Oran,Algeria
– sequence: 4
  givenname: Luce
  surname: Morin
  fullname: Morin, Luce
  organization: Univ. Rennes, INSA Rennes, CNRS, IETR - UMR,Rennes,France,6164
BookMark eNo1j99KwzAcRiPohZu-gWBeoDX_21xuZWqhYGHTGy9GmvziAjYdaRB8ewvq1YHD4YNvhS7jFAGhe0pKSol-aJu2F5pLXTLCeEkJY4xocYFWtGI11TWX6hq977wPNkDMuIdU7E9Txodk4uynNC5ia2ZweBtyMhlwZ5yDhPsELtgcpoiXDG-cOefwBfgtOJjwPicwY4gfN-jKm88Zbv-4Rq-Pu0PzXHQvT22z6YoTq6tcCCu0195oZwWp5EC9soQLQhUFIodKSm0Aakekqxjh3DMnBi9hkUoZxfka3f3uBgA4nlMYTfo-_h_mP_oiURU
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICIP49359.2023.10222094
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728198356
9781728198354
EndPage 1839
ExternalDocumentID 10222094
Genre orig-research
GrantInformation_xml – fundername: Région Bretagne
  funderid: 10.13039/501100011697
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-h287t-4c49f9fa9dc4075b1f6c0340161e05b7559aee8d05d72033f2d4bf5eaee66a633
IEDL.DBID RIE
IngestDate Wed Jan 10 09:27:48 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-h287t-4c49f9fa9dc4075b1f6c0340161e05b7559aee8d05d72033f2d4bf5eaee66a633
OpenAccessLink https://hal.science/hal-04356639
PageCount 5
ParticipantIDs ieee_primary_10222094
PublicationCentury 2000
PublicationDate 2023-Oct.-8
PublicationDateYYYYMMDD 2023-10-08
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-Oct.-8
  day: 08
PublicationDecade 2020
PublicationTitle 2023 IEEE International Conference on Image Processing (ICIP)
PublicationTitleAbbrev ICIP
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.2625685
Snippet Recently, HTTP adaptive streaming (HAS) has become a standard approach for over-the-top (OTT)-based video streaming services due to its ability to provide...
SourceID ieee
SourceType Publisher
StartPage 1835
SubjectTerms adaptive video streaming
Bit rate
Bitrate ladder
Buildings
Computational modeling
Feature extraction
HEVC
Image coding
Streaming media
Transformers
video compression
vision transformer
Title Efficient Per-Shot Transformer-Based Bitrate Ladder Prediction for Adaptive Video Streaming
URI https://ieeexplore.ieee.org/document/10222094
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62J08qVnyTg9dsd7tJ2j3a0tKKloKtFDyUPCa0iN1Sthd_vZNsV1EQvC1DstlkIDPfzsw3hNwZrRxknYSh7W0z7hLDlE4F47atW1omKglk1U9jOZzxh7mY74vVQy0MAITkM4j8Y4jl29zs_K-yZkAniEdqpIbIrSzW2udsJXHWHPVGE-4rTSPfEzyqRv_omxLMxuCIjKsFy2yRt2hX6Mh8_OJi_PcXHZPGd4UenXzZnhNyAOtT8toPfBA4g05gy56XeUGnlWOKgi5aLEu7q0BISx_9nbPFt_hQjVcPxWH03qqNvwHpy8pCTn3QWr3jEg0yG_SnvSHbN09gSwRBBeOGZy5zKrMGMZvQiZMmTrn38CAWuo1IQgF0bCysj8SmrmW5dgJQKKWSaXpG6ut8DeeEShAWp6MriP5dR9gMt6ylMWCcF8cXpOFPZrEp-TEW1aFc_iG_IodeQWUm3TWpF9sd3KBpL_RtUOkndculIA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dS8MwEA86H_RJxYnf5sHXdO3aZOujGxubbmPgJgMfRj4uOMR1jO7Fv95LuioKgm_lSJomB_nd9e5-R8idVtJC2owYYm-DJTbSTKqYs8Q0VF2JSEaerHo4Er1p8jDjs22xuq-FAQCffAaBe_SxfJPpjftVVvPeCfoju2QPgZ9HRbnWNmsrCtNav90fJ67WNHBdwYNy_I_OKR44uodkVC5Z5Iu8BZtcBfrjFxvjv7_piFS_a_To-At9jskOLE_IS8czQuAMOoY1e3rNcjopTVMUtBCzDG0tPCUtHbhbZ41vccEapyCKw-i9kSt3B9LnhYGMurC1fMclqmTa7UzaPbZtn8Be0Q3KWaKT1KZWpkaj18ZVZIUO48TZeBBy1UBfQgI0TciNi8XGtm4SZTmgUAgp4viUVJbZEs4IFcANTkdjEC28JjcpblkJrUFbJw7PSdWdzHxVMGTMy0O5-EN-S_Z7k-FgPuiPHi_JgVNWkVd3RSr5egPXCPS5uvHq_QRC_ahp
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+International+Conference+on+Image+Processing+%28ICIP%29&rft.atitle=Efficient+Per-Shot+Transformer-Based+Bitrate+Ladder+Prediction+for+Adaptive+Video+Streaming&rft.au=Telili%2C+Ahmed&rft.au=Hamidouche%2C+Wassim&rft.au=Fezza%2C+Sid+Ahmed&rft.au=Morin%2C+Luce&rft.date=2023-10-08&rft.pub=IEEE&rft.spage=1835&rft.epage=1839&rft_id=info:doi/10.1109%2FICIP49359.2023.10222094&rft.externalDocID=10222094