Memory-based Deep Reinforcement Learning for POMDPs
A promising characteristic of Deep Reinforcement Learning (DRL) is its capability to learn optimal policy in an end-to-end manner without relying on feature engineering. However, most approaches assume a fully observable state space, i.e. fully observable Markov Decision Processes (MDPs). In real-wo...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.02.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A promising characteristic of Deep Reinforcement Learning (DRL) is its
capability to learn optimal policy in an end-to-end manner without relying on
feature engineering. However, most approaches assume a fully observable state
space, i.e. fully observable Markov Decision Processes (MDPs). In real-world
robotics, this assumption is unpractical, because of issues such as sensor
sensitivity limitations and sensor noise, and the lack of knowledge about
whether the observation design is complete or not. These scenarios lead to
Partially Observable MDPs (POMDPs). In this paper, we propose
Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient
(LSTM-TD3) by introducing a memory component to TD3, and compare its
performance with other DRL algorithms in both MDPs and POMDPs. Our results
demonstrate the significant advantages of the memory component in addressing
POMDPs, including the ability to handle missing and noisy observation data. |
---|---|
DOI: | 10.48550/arxiv.2102.12344 |