Modeling Bellman-error with logistic distribution with applications in reinforcement learning

In modern Reinforcement Learning (RL) approaches, optimizing the Bellman error is a critical element across various algorithms, notably in deep Q-Learning and related methodologies. Traditional approaches predominantly employ the mean-squared Bellman error (MSELoss) as the standard loss function. Ho...

Full description

Saved in:
Bibliographic Details
Published inNeural networks Vol. 177; p. 106387
Main Authors Lv, Outongyi, Zhou, Bingxin, Yang, Lin F.
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In modern Reinforcement Learning (RL) approaches, optimizing the Bellman error is a critical element across various algorithms, notably in deep Q-Learning and related methodologies. Traditional approaches predominantly employ the mean-squared Bellman error (MSELoss) as the standard loss function. However, the assumption of Bellman errors following the Gaussian distribution may oversimplify the nuanced characteristics of RL applications. In this work, we revisit the distribution of Bellman error in RL training, demonstrating that it tends to follow the Logistic distribution rather than the commonly assumed Normal distribution. We propose replacing MSELoss with a Logistic maximum likelihood function (LLoss) and rigorously test this hypothesis through extensive numerical experiments across diverse online and offline RL environments. Our findings consistently show that integrating the Logistic correction into the loss functions of various baseline RL methods leads to superior performance compared to their MSE counterparts. Additionally, we employ Kolmogorov–Smirnov tests to substantiate that the Logistic distribution offers a more accurate fit for approximating Bellman errors. This study also offers a novel theoretical contribution by establishing a clear connection between the distribution of Bellman error and the practice of proportional reward scaling, a common technique for performance enhancement in RL. Moreover, we explore the sample-accuracy trade-off involved in approximating the Logistic distribution, leveraging the Bias–Variance decomposition to mitigate excessive computational resources. The theoretical and empirical insights presented in this study lay a significant foundation for future research, potentially advancing methodologies, and understanding in RL, particularly in the distribution-based optimization of Bellman error. •We challenge the belief in a Normally distributed Bellman error with a Logistic distribution.•We explore Logistic distribution sampling error using Bias-Variance decomposition for optimal batch size.•We confirm the Logistic distribution’s robustness for Bellman error with extensive testing and Kolmogorov–Smirnov tests.Novelty: We provide the first rigorous Logistic distribution modeling scheme for modeling the distribution of Bellman error and relate it to the reward scaling problem.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0893-6080
1879-2782
1879-2782
DOI:10.1016/j.neunet.2024.106387