Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization

We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determine...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on automatic control pp. 1 - 16
Main Authors	Park, Shinkyu, Leonard, Naomi Ehrich
Format	Journal Article
Language	English
Published	IEEE 2025
Subjects	Convergence decision making Delay effects Demand response evolutionary dynamics game theory Games Multi-agent systems Nash equilibrium nonlinear systems Oscillators Roads Stability analysis Training Vectors
Online Access	Get full text

Cover

Loading…

Abstract	We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determined by an underlying game. Their goal is to learn the strategies that correspond to the Nash equilibrium of the game. However, when games are subject to time delays, conventional decision-making models from the population game literature may result in oscillations in the strategy revision process or convergence to an equilibrium other than the Nash. To address this problem, we propose the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model, along with an algorithm that iteratively updates the model's regularization parameter across a network of communicating agents. Using passivity-based convergence analysis techniques, we show that the KLD-RL model achieves convergence to the Nash equilibrium without oscillations, even for a class of population games that are subject to time delays. We demonstrate our main results numerically on a two-population congestion game and a two-population zero-sum game.
AbstractList	We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one another. At each stage of these interactions, agents use their decision-making model to revise their strategy selections based on payoffs determined by an underlying game. Their goal is to learn the strategies that correspond to the Nash equilibrium of the game. However, when games are subject to time delays, conventional decision-making models from the population game literature may result in oscillations in the strategy revision process or convergence to an equilibrium other than the Nash. To address this problem, we propose the Kullback-Leibler Divergence Regularized Learning (KLD-RL) model, along with an algorithm that iteratively updates the model's regularization parameter across a network of communicating agents. Using passivity-based convergence analysis techniques, we show that the KLD-RL model achieves convergence to the Nash equilibrium without oscillations, even for a class of population games that are subject to time delays. We demonstrate our main results numerically on a two-population congestion game and a two-population zero-sum game.
Author	Leonard, Naomi Ehrich Park, Shinkyu
Author_xml	– sequence: 1 givenname: Shinkyu surname: Park fullname: Park, Shinkyu email: shinkyu7275@gmail.com organization: Electrical and Computer Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia – sequence: 2 givenname: Naomi Ehrich surname: Leonard fullname: Leonard, Naomi Ehrich email: naomi@princeton.edu organization: Department of Mechanical and Aerospace Engineering, Princeton University, Princeton, NJ, USA
BookMark	eNpNkE1PwkAURScGExHdu3Axf6A4n-3MkoCisYnE4Lp5nb7iaGnJDGjw11uEhauXe3PPW5xLMmi7Fgm54WzMObN3y8l0LJjQY6nTvjBnZMi1NonQQg7IkDFuEitMekEuY_zoY6oUH5IyRwitb1f022_f6Qwb2GNFF7Dv6jpS39JFt9k1sPVdS-ewxkh38TB_3jVNCe4zydGXDQY6818YVtg6pK-46pHgf_6wK3JeQxPx-nRH5O3hfjl9TPKX-dN0kieOS7VNXK0EZE4akCa1KG2Z9RXPuJBccYlZpkBVzKXMaAVVCYID18KKCqxzJcgRYce_LnQxBqyLTfBrCPuCs-LgqOgdFQdHxclRj9weEY-I_-Y2Tbk28hcayGW6
CODEN	IETAA9
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION
DOI	10.1109/TAC.2025.3561108
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2523
EndPage	16
ExternalDocumentID	10_1109_TAC_2025_3561108 10966158
Genre	orig-research
GroupedDBID	-~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS F5P HZ~ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 ~02 3EH 5VS AAYOK AAYXX AETIX AGSQL AI. AIBXA ALLEH CITATION EJD H~9 IAAWW IBMZZ ICLAB IDIHD IFJZH RIG VH1 VJK
ID	FETCH-LOGICAL-c134t-cf42a7c38a3869e39b7cf4171231413e774a4d0c60854adba21a15292da9ccba3
IEDL.DBID	RIE
ISSN	0018-9286
IngestDate	Sun Jul 06 05:07:35 EDT 2025 Wed Aug 27 02:03:36 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c134t-cf42a7c38a3869e39b7cf4171231413e774a4d0c60854adba21a15292da9ccba3
PageCount	16
ParticipantIDs	crossref_primary_10_1109_TAC_2025_3561108 ieee_primary_10966158
PublicationCentury	2000
PublicationDate	2025-00-00
PublicationDateYYYYMMDD	2025-01-01
PublicationDate_xml	– year: 2025 text: 2025-00-00
PublicationDecade	2020
PublicationTitle	IEEE transactions on automatic control
PublicationTitleAbbrev	TAC
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0016441
Score	2.4769783
Snippet	We study a multi-agent decision problem in large population games. Agents from multiple populations select strategies for repeated interactions with one...
SourceID	crossref ieee
SourceType	Index Database Publisher
StartPage	1
SubjectTerms	Convergence decision making Delay effects Demand response evolutionary dynamics game theory Games Multi-agent systems Nash equilibrium nonlinear systems Oscillators Roads Stability analysis Training Vectors
Title	Learning with Delayed Payoffs in Population Games using Kullback-Leibler Divergence Regularization
URI	https://ieeexplore.ieee.org/document/10966158
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3Na8IwFA-bp-2wT8fcFznsskM706amPYrOyT5EhoK3kk8RpQ5XD-6v30tapQwGu5WQQHgv6e_3Xt4HQvcqDAg3UnkG-IJHpaFeoojy4gTgmjBDpUsUfh-0-mP6MokmZbK6y4XRWrvgM-3bT_eWr5ZybV1lcMOBnJMo3kf7YLkVyVq7JwML7MVvF25wEO_eJJvJ46jdAUswiPwQ2AKxnSQrGFRpquIwpXeMBtvdFKEkc3-dC19-_yrU-O_tnqCjkl3idnEcTtGezs7QYaXm4DkSZUXVKbYuWNzVC77RCg_5ZmnMF55leLjr6YWfbQwttrHxU_wKxqrgcu696ZlY6BXu2pAOV8sTf7iO9qsyp7OOxr2nUafvlY0WPElCmnugoYAzGcY8jFuJDhPBYIgwQDUCIKeBInKqmrIF_IxyJTjoF3A_CRRPpBQ8vEC1bJnpS4RhDjNcU6YVcAXaAlYspNA84oZFkvEGetiKPv0s6mmkzg5pJimoKbVqSks1NVDdCrUyr5Dn1R_j1-jALi8cJDeolq_W-hYoQy7u3FH5AetJvwQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxFA5aD-rBXaxrDl48zNjMZLZjaa3VLhRpobch25TSMpU6PdRf70tmWgZB8DaEEMJ7yXzfe3kLQo_SdQhLhLQS4AsWFQm1IkmkFUYA1yRIqDCJwr2-3x7R97E3LpLVTS6MUsoEnylbf5q3fLkQK-0qgxsO5Jx44S7ag5U8kqdrbR8NNLTnP164w064fZWsRc_DegNsQcezXeALRPeSLKFQqa2KQZXWMepv9pMHk8zsVcZt8f2rVOO_N3yCjgp-iev5gThFOyo9Q4elqoPniBc1VSdYO2FxU83ZWkk8YOtFknzhaYoH265e-FVH0WIdHT_BHTBXORMzq6umfK6WuKmDOkw1T_xhetovi6zOCzRqvQwbbatotWAJ4tLMAh05LBBuyNzQj5Qb8QCGSAC4RgDmFJBERmVN-MDQKJOcgYYB-SNHskgIztxLVEkXqbpCGOYECVM0UBLYAvWBF3PBFfNYEngiYFX0tBF9_JlX1IiNJVKLYlBTrNUUF2qqogst1NK8XJ7Xf4w_oP32sNeNu2_9zg060Evl7pJbVMmWK3UHBCLj9-bY_AAQY8JN
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+with+Delayed+Payoffs+in+Population+Games+using+Kullback-Leibler+Divergence+Regularization&rft.jtitle=IEEE+transactions+on+automatic+control&rft.au=Park%2C+Shinkyu&rft.au=Leonard%2C+Naomi+Ehrich&rft.date=2025&rft.issn=0018-9286&rft.eissn=1558-2523&rft.spage=1&rft.epage=16&rft_id=info:doi/10.1109%2FTAC.2025.3561108&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TAC_2025_3561108
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9286&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9286&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9286&client=summon