Distributed and Distribution-Robust Meta Reinforcement Learning (D ^-RMRL) for Data Pre-Storage and Routing in Cube Satellite Networks

In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of selected topics in signal processing Vol. 17; no. 1; pp. 128 - 141
Main Authors	Hu, Ye, Wang, Xiaodong, Saad, Walid
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Actor-critic Algorithms cube satellite network data pre-storage Decomposition Heuristic algorithms Logic gates Machine learning Markov processes meta learning multi-agent reinforcement learning Optimization Robustness Routing Satellite communication Satellite networks Satellites Task analysis Training value decomposition
Online Access	Get full text
ISSN	1932-4553 1941-0484
DOI	10.1109/JSTSP.2022.3232944

Cover

More Information
Summary:	In this paper, the problem of data pre-storage and routing in dynamic, resource-constrained cube satellite networks is studied. In such a network, each cube satellite delivers requested data to user clusters under its coverage. A group of ground gateways will route and pre-store certain data to the satellites, such that the ground users can be directly served with the pre-stored data. This pre-storage and routing design problem is formulated as a decentralized Markov decision process (Dec-MDP) in which we seek to find the optimal strategy that maximizes the pre-store hit rate, i.e., the fraction of users being directly served with the pre-stored data. To obtain the optimal strategy, a distributed distribution-robust meta reinforcement learning (D<inline-formula><tex-math notation="LaTeX">^{2}</tex-math></inline-formula>-RMRL) algorithm is proposed that consists of three key ingredients: value-decomposition for achieving the global optimum in distributed setting with minimum communication overhead, meta learning to obtain the optimal initial to reduce the training time under dynamic conditions, and pre-training to further speed up the meta training procedure. Simulation results show that, using the proposed value decomposition and meta training techniques, the satellite networks can achieve a 31.8% improvement of the pre-store hits and a 40.7% improvement of the convergence speed, compared to a baseline reinforcement learning algorithm. Moreover, the use of the proposed pre-training mechanism helps to shorten the meta-learning procedure by up to 43.7%.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1932-4553 1941-0484
DOI:	10.1109/JSTSP.2022.3232944