Learning-Based Online QoE Optimization in Multi-Agent Video Streaming

Video streaming has become a major usage scenario for the Internet. The growing popularity of new applications, such as 4K and 360-degree videos, mandates that network resources must be carefully apportioned among different users in order to achieve the optimal Quality of Experience (QoE) and fairne...

Full description

Saved in:

Bibliographic Details
Published in	Algorithms Vol. 15; no. 7; p. 227
Main Authors	Wang, Yimeng, Agarwal, Mridul, Lan, Tian, Aggarwal, Vaneet
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.07.2022
Subjects	Algorithms Bandwidths Decision making Deep learning Machine learning Markov processes Multiagent systems Network management systems Neural networks Optimization policy gradient reinforcement learning resource allocation Streaming media Streaming services video streaming Video transmission
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Video streaming has become a major usage scenario for the Internet. The growing popularity of new applications, such as 4K and 360-degree videos, mandates that network resources must be carefully apportioned among different users in order to achieve the optimal Quality of Experience (QoE) and fairness objectives. This results in a challenging online optimization problem, as networks grow increasingly complex and the relevant QoE objectives are often nonlinear functions. Recently, data-driven approaches, deep Reinforcement Learning (RL) in particular, have been successfully applied to network optimization problems by modeling them as Markov decision processes. However, existing RL algorithms involving multiple agents fail to address nonlinear objective functions on different agents’ rewards. To this end, we leverage MAPG-finite, a policy gradient algorithm designed for multi-agent learning problems with nonlinear objectives. It allows us to optimize bandwidth distributions among multiple agents and to maximize QoE and fairness objectives on video streaming rewards. Implementing the proposed algorithm, we compare the MAPG-finite strategy with a number of baselines, including static, adaptive, and single-agent learning policies. The numerical results show that MAPG-finite significantly outperforms the baseline strategies with respect to different objective functions and in various settings, including both constant and adaptive bitrate videos. Specifically, our MAPG-finite algorithm maximizes QoE by 15.27% and maximizes fairness by 22.47% compared to the standard SARSA algorithm for a 2000 KB/s link.
ISSN:	1999-4893 1999-4893
DOI:	10.3390/a15070227