Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond exist...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Zhang, Shangtong, Tachet, Remi, Laroche, Romain
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 24.10.2022
Subjects	Algorithms Density ratio Markov chains
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!