Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications

This paper addresses the problem of designing optimal control policies for mobile robots with mission and safety requirements specified using Linear Temporal Logic (LTL). We consider robots with unknown stochastic dynamics operating in environments with unknown geometric structure. The robots are eq...

Full description

Saved in:

Bibliographic Details
Main Authors	Wang, Jun, Hasanbeig, Hosein, Tan, Kaiyuan, Sun, Zihe, Kantaros, Yiannis
Format	Journal Article
Language	English
Published	28.11.2023
Subjects	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Robotics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper addresses the problem of designing optimal control policies for mobile robots with mission and safety requirements specified using Linear Temporal Logic (LTL). We consider robots with unknown stochastic dynamics operating in environments with unknown geometric structure. The robots are equipped with sensors allowing them to detect obstacles. Our goal is to synthesize a control policy that maximizes the probability of satisfying an LTL-encoded task in the presence of motion and environmental uncertainty. Several deep reinforcement learning (DRL) algorithms have been proposed recently to address similar problems. A common limitation in related works is that of slow learning performance. In order to address this issue, we propose a novel DRL algorithm, which has the capability to learn control policies at a notably faster rate compared to similar methods. Its sample efficiency is due to a mission-driven exploration strategy that prioritizes exploration towards directions that may contribute to mission accomplishment. Identifying these directions relies on an automaton representation of the LTL task as well as a learned neural network that (partially) models the unknown system dynamics. We provide comparative experiments demonstrating the efficiency of our algorithm on robot navigation tasks in unknown environments.
DOI:	10.48550/arxiv.2311.17059