Learning in multi-agent systems with asymmetric information structure

In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimizati...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 412; pp. 351 - 359
Main Authors Tan, Cheng, Qi, Qingyuan, Wong, Wing Shing
Format Journal Article
LanguageEnglish
Published Elsevier B.V 28.10.2020
Subjects
Online AccessGet full text
ISSN0925-2312
1872-8286
DOI10.1016/j.neucom.2019.08.112

Cover

More Information
Summary:In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimization problem subject to stochastic systems involving input delay. Since the probability statistics of system noise is unknown, the decision-maker can not utilize the traditional optimal control strategies. Motivated by online convex optimization theory, we introduce the notion of regret, which measures the cumulative performance difference between the optimal statistics known (offline) index value and the statistics unknown (online) index value. The contributions of this paper are twofold. First, utilizing the linear minimum mean square biased estimate, we derive a learning based control policy and then characterize its behavior. Second, under some basic assumptions, we further prove that the regret grows at a sub-linear rate and it is explicitly bounded by O(lnT).
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2019.08.112