Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization

We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determine...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on automatic control pp. 1 - 16
Main Authors Park, Shinkyu, Leonard, Naomi Ehrich
Format Journal Article
LanguageEnglish
Published IEEE 2025
Subjects
Online AccessGet full text

Cover

Loading…
Abstract We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determined by an underlying game. Their goal is to learn the strategies that correspond to the Nash equilibrium of the game. However, when games are subject to time delays, conventional decision-making models from the population game literature may result in oscillations in the strategy revision process or convergence to an equilibrium other than the Nash. To address this problem, we propose the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model, along with an algorithm that iteratively updates the model's regularization parameter across a network of communicating agents. Using passivity-based convergence analysis techniques, we show that the KLD-RL model achieves convergence to the Nash equilibrium without oscillations, even for a class of population games that are subject to time delays. We demonstrate our main results numerically on a two-population congestion game and a two-population zero-sum game.
AbstractList We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determined by an underlying game. Their goal is to learn the strategies that correspond to the Nash equilibrium of the game. However, when games are subject to time delays, conventional decision-making models from the population game literature may result in oscillations in the strategy revision process or convergence to an equilibrium other than the Nash. To address this problem, we propose the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model, along with an algorithm that iteratively updates the model's regularization parameter across a network of communicating agents. Using passivity-based convergence analysis techniques, we show that the KLD-RL model achieves convergence to the Nash equilibrium without oscillations, even for a class of population games that are subject to time delays. We demonstrate our main results numerically on a two-population congestion game and a two-population zero-sum game.
Author Leonard, Naomi Ehrich
Park, Shinkyu
Author_xml – sequence: 1
  givenname: Shinkyu
  surname: Park
  fullname: Park, Shinkyu
  email: shinkyu7275@gmail.com
  organization: Electrical and Computer Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
– sequence: 2
  givenname: Naomi Ehrich
  surname: Leonard
  fullname: Leonard, Naomi Ehrich
  email: naomi@princeton.edu
  organization: Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ, USA
BookMark eNpNkE1PwkAURScGExHdu3Axf6A4n-3MkoCisYnE4Lp5nb7iaGnJDGjw11uEhauXe3PPW5xLMmi7Fgm54WzMObN3y8l0LJjQY6nTvjBnZMi1NonQQg7IkDFuEitMekEuY_zoY6oUH5IyRwitb1f022_f6Qwb2GNFF7Dv6jpS39JFt9k1sPVdS-ewxkh38TB_3jVNCe4zydGXDQY6818YVtg6pK-46pHgf_6wK3JeQxPx-nRH5O3hfjl9TPKX-dN0kieOS7VNXK0EZE4akCa1KG2Z9RXPuJBccYlZpkBVzKXMaAVVCYID18KKCqxzJcgRYce_LnQxBqyLTfBrCPuCs-LgqOgdFQdHxclRj9weEY-I_-Y2Tbk28hcayGW6
CODEN IETAA9
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
DOI 10.1109/TAC.2025.3561108
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2523
EndPage 16
ExternalDocumentID 10_1109_TAC_2025_3561108
10966158
Genre orig-research
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
~02
3EH
5VS
AAYOK
AAYXX
AETIX
AGSQL
AI.
AIBXA
ALLEH
CITATION
EJD
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFJZH
RIG
VH1
VJK
ID FETCH-LOGICAL-c134t-cf42a7c38a3869e39b7cf4171231413e774a4d0c60854adba21a15292da9ccba3
IEDL.DBID RIE
ISSN 0018-9286
IngestDate Sun Jul 06 05:07:35 EDT 2025
Wed Aug 27 02:03:36 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c134t-cf42a7c38a3869e39b7cf4171231413e774a4d0c60854adba21a15292da9ccba3
PageCount 16
ParticipantIDs crossref_primary_10_1109_TAC_2025_3561108
ieee_primary_10966158
PublicationCentury 2000
PublicationDate 2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025-00-00
PublicationDecade 2020
PublicationTitle IEEE transactions on automatic control
PublicationTitleAbbrev TAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0016441
Score 2.4769783
Snippet We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one...
SourceID crossref
ieee
SourceType Index Database
Publisher
StartPage 1
SubjectTerms Convergence
decision making
Delay effects
Demand response
evolutionary dynamics
game theory
Games
Multi-agent systems
Nash equilibrium
nonlinear systems
Oscillators
Roads
Stability analysis
Training
Vectors
Title Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization
URI https://ieeexplore.ieee.org/document/10966158
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Na8IwFA-bp-2wT8fcFznsskM706amPYrOyT5EhoK3kk8RpQ5XD-6v30tapQwGu5WQQHgv6e_3Xt4HQvcqDAg3UnkG-IJHpaFeoojy4gTgmjBDpUsUfh-0-mP6MokmZbK6y4XRWrvgM-3bT_eWr5ZybV1lcMOBnJMo3kf7YLkVyVq7JwML7MVvF25wEO_eJJvJ46jdAUswiPwQ2AKxnSQrGFRpquIwpXeMBtvdFKEkc3-dC19-_yrU-O_tnqCjkl3idnEcTtGezs7QYaXm4DkSZUXVKbYuWNzVC77RCg_5ZmnMF55leLjr6YWfbQwttrHxU_wKxqrgcu696ZlY6BXu2pAOV8sTf7iO9qsyp7OOxr2nUafvlY0WPElCmnugoYAzGcY8jFuJDhPBYIgwQDUCIKeBInKqmrIF_IxyJTjoF3A_CRRPpBQ8vEC1bJnpS4RhDjNcU6YVcAXaAlYspNA84oZFkvEGetiKPv0s6mmkzg5pJimoKbVqSks1NVDdCrUyr5Dn1R_j1-jALi8cJDeolq_W-hYoQy7u3FH5AetJvwQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxFA5aD-rBXaxrDl48zNjMZLZjaa3VLhRpobch25TSMpU6PdRf70tmWgZB8DaEEMJ7yXzfe3kLQo_SdQhLhLQS4AsWFQm1IkmkFUYA1yRIqDCJwr2-3x7R97E3LpLVTS6MUsoEnylbf5q3fLkQK-0qgxsO5Jx44S7ag5U8kqdrbR8NNLTnP164w064fZWsRc_DegNsQcezXeALRPeSLKFQqa2KQZXWMepv9pMHk8zsVcZt8f2rVOO_N3yCjgp-iev5gThFOyo9Q4elqoPniBc1VSdYO2FxU83ZWkk8YOtFknzhaYoH265e-FVH0WIdHT_BHTBXORMzq6umfK6WuKmDOkw1T_xhetovi6zOCzRqvQwbbatotWAJ4tLMAh05LBBuyNzQj5Qb8QCGSAC4RgDmFJBERmVN-MDQKJOcgYYB-SNHskgIztxLVEkXqbpCGOYECVM0UBLYAvWBF3PBFfNYEngiYFX0tBF9_JlX1IiNJVKLYlBTrNUUF2qqogst1NK8XJ7Xf4w_oP32sNeNu2_9zg060Evl7pJbVMmWK3UHBCLj9-bY_AAQY8JN
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+with+Delayed+Payoffs+in+Population+Games+using+Kullback-Leibler+Divergence+Regularization&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Park%2C+Shinkyu&rft.au=Leonard%2C+Naomi+Ehrich&rft.date=2025&rft.issn=0018-9286&rft.eissn=1558-2523&rft.spage=1&rft.epage=16&rft_id=info:doi/10.1109%2FTAC.2025.3561108&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAC_2025_3561108
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon