CBMoS: Combinatorial Bandit Learning for Mode Selection and Resource Allocation in D2D Systems

The complexity of the mode selection and resource allocation (MS&RA) problem has hampered the commercialization progress of Device-to-Device (D2D) communication in 5G networks. Furthermore, the combinatorial nature of MS&RA has forced the majority of existing proposals to focus on constraine...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal on selected areas in communications Vol. 37; no. 10; pp. 2225 - 2238
Main Authors	Ortiz, Andrea, Asadi, Arash, Engelhardt, Max, Klein, Anja, Hollick, Matthias
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Combinatorial analysis combinatorial multi-armed bandits Commercialization Complexity Computational complexity Computer simulation Constraints Device-to-device communication Device-to-device communications Distance learning Flight simulators Interference Modal choice mode selection and resource allocation Multi-armed bandit problems online learning Resource allocation Resource management Solution space Throughput Variation Wireless fidelity Wireless networks
Online Access	Get full text
ISSN	0733-8716 1558-0008
DOI	10.1109/JSAC.2019.2933764

Cover

More Information
Summary:	The complexity of the mode selection and resource allocation (MS&RA) problem has hampered the commercialization progress of Device-to-Device (D2D) communication in 5G networks. Furthermore, the combinatorial nature of MS&RA has forced the majority of existing proposals to focus on constrained scenarios or offline solutions to contain the size of the problem. Given the real-time constraints in actual deployments, a reduction in computational complexity is necessary. Adaptability is another key requirement for mobile networks that are exposed to constant changes such as channel quality fluctuations and mobility. In this article, we propose an online learning technique (i.e., CBMoS) which leverages combinatorial multi-armed bandits (CMAB) to tackle the combinatorial nature of MS&RA. Furthermore, our two-stage CMAB design results in a tight model, which eliminates the theoretically feasible but practicality invalid options from the solution space. We prototype the first SDR-based D2D testbed to verify the performance of CBMoS under real-world conditions. The simulations confirm that the fast learning speed of CBMoS leads to outperforming the benchmark schemes by up to 132%. In experiments, CBMoS exhibits even higher performance (up to 142%) than in the simulations. This stems from the adaptability/fast learning speed of CBMoS in presence of high channel dynamics which cannot be captured via statistical channel models used in the simulators.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0733-8716 1558-0008
DOI:	10.1109/JSAC.2019.2933764