Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

In this paper, we establish the global optimality and convergence rate of an off-policy actor critic algorithm in the tabular setting without using density ratio to correct the discrepancy between the state distribution of the behavior policy and that of the target policy. Our work goes beyond exist...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Zhang, Shangtong, Tachet, Remi, Laroche, Romain
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 24.10.2022
Subjects
Online AccessGet full text

Cover

Loading…