Embedding a priori knowledge in reinforcement learning

In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space be...

Full description

Saved in:
Bibliographic Details
Published inJournal of intelligent & robotic systems Vol. 21; no. 1; pp. 51 - 71
Main Author RIBEIRO, C. H. C
Format Journal Article
LanguageEnglish
Published Dordrecht Kluwer 1998
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the last years, temporal differences methods have been put forward as convenient tools for reinforcement learning. Techniques based on temporal differences, however, suffer from a serious drawback: as stochastic adaptive algorithms, they may need extensive exploration of the state-action space before convergence is achieved. Although the basic methods are now reasonably well understood, it is precisely the structural simplicity of the reinforcement learning principle - learning through experimentation - that causes these excessive demands on the learning agent. Additionally, one must consider that the agent is very rarely a tabula rasa: some rough knowledge about characteristics of the surrounding environment is often available. In this paper, I present methods for embedding a priori knowledge in a reinforcement learning technique in such a way that both the mathematical structure of the basic learning algorithm and the capacity to generalise experience across the state-action space are kept. Extensive experimental results show that the resulting variants may lead to good performance, provided a sensible balance between risky use of prior imprecise knowledge and cautious use of learning experience is adopted.[PUBLICATION ABSTRACT]
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0921-0296
1573-0409
DOI:10.1023/A:1007968115863