Learning in multi-agent systems with asymmetric information structure
In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimizati...
Saved in:
Published in | Neurocomputing (Amsterdam) Vol. 412; pp. 351 - 359 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
28.10.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 0925-2312 1872-8286 |
DOI | 10.1016/j.neucom.2019.08.112 |
Cover
Summary: | In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimization problem subject to stochastic systems involving input delay. Since the probability statistics of system noise is unknown, the decision-maker can not utilize the traditional optimal control strategies. Motivated by online convex optimization theory, we introduce the notion of regret, which measures the cumulative performance difference between the optimal statistics known (offline) index value and the statistics unknown (online) index value. The contributions of this paper are twofold. First, utilizing the linear minimum mean square biased estimate, we derive a learning based control policy and then characterize its behavior. Second, under some basic assumptions, we further prove that the regret grows at a sub-linear rate and it is explicitly bounded by O(lnT). |
---|---|
ISSN: | 0925-2312 1872-8286 |
DOI: | 10.1016/j.neucom.2019.08.112 |