Learning in multi-agent systems with asymmetric information structure

In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimizati...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 412; pp. 351 - 359
Main Authors	Tan, Cheng, Qi, Qingyuan, Wong, Wing Shing
Format	Journal Article
Language	English
Published	Elsevier B.V 28.10.2020
Subjects	Asymmetric information Learning based control policy Linear minimum mean square unbiased estimation Online qudratic optimization Regret Asymmetric information Regret Learning based control policy Linear minimum mean square unbiased estimation Online qudratic optimization
Online Access	Get full text
ISSN	0925-2312 1872-8286
DOI	10.1016/j.neucom.2019.08.112

Cover

More Information
Summary:	In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimization problem subject to stochastic systems involving input delay. Since the probability statistics of system noise is unknown, the decision-maker can not utilize the traditional optimal control strategies. Motivated by online convex optimization theory, we introduce the notion of regret, which measures the cumulative performance difference between the optimal statistics known (offline) index value and the statistics unknown (online) index value. The contributions of this paper are twofold. First, utilizing the linear minimum mean square biased estimate, we derive a learning based control policy and then characterize its behavior. Second, under some basic assumptions, we further prove that the regret grows at a sub-linear rate and it is explicitly bounded by O(lnT).
ISSN:	0925-2312 1872-8286
DOI:	10.1016/j.neucom.2019.08.112