Regret Emotion Based Reinforcement Learning for Path Planning in Autonomous Agents

Path planning (PP) is a major topic of concern in the domain of autonomous agents applicable to many different applications such as disasters. Reinforcement learning approaches to path planning have seen significant advancements in recent years. Incorporating the concept of emotions in reinforcement...

Full description

Saved in:

Bibliographic Details
Published in	International Conference on Affective Computing and Intelligent Interaction and workshops pp. 266 - 274
Main Authors	Soman, Gayathri, Judy, M.V., Madria, Sanjay
Format	Conference Proceeding
Language	English
Published	IEEE 15.09.2024
Subjects	Affective computing autonomous agent Autonomous agents Disasters emotion Greedy algorithms Path planning Q-learning reinforcement learning
Online Access	Get full text
ISSN	2156-8111
DOI	10.1109/ACII63134.2024.00035

Cover

Loading…

More Information
Summary:	Path planning (PP) is a major topic of concern in the domain of autonomous agents applicable to many different applications such as disasters. Reinforcement learning approaches to path planning have seen significant advancements in recent years. Incorporating the concept of emotions in reinforcement learning agents enhances their ability to take better decisions in a given situation Here we have proposed a methodology that encompasses the concept of regret emotion in the epsilon decay strategy of the Q-learning algorithm that uses a cumulative regret value of experienced regret and anticipated regret which occurs in the agent. Experienced Regret can be modeled as a function that minimizes the error between the optimal action and the actual action taken in a given state. Anticipated regret can be modeled using the concept of temporal difference errors. In the epsilon-greedy strategy, regret emotion can be used to find a balance between exploitation and exploration. If the agent wants to exhibit the best possible behaviour, the agent should explore the environment more during the beginning phase of the learning process and then move on to exploiting the environment during the later phases. However, with the current epsilon greedy algorithm, epsilon decays at a faster rate, which thereby restricts the agent's ability of exploration. In the proposed method, the agent at a particular state would decide whether to exploit or explore based on the intensity of the regret emotion of the agent. Using the proposed methodology the agent will be able to explore the environment more frequently helping the agent to acquire more rewards and learn the environment more rapidly. The proposed method outperforms the traditional epsilon greedy Q-learning algorithm for epsilon decay in terms of the rate at which the decay occurs and the amount of reward that the agent gains.
ISSN:	2156-8111
DOI:	10.1109/ACII63134.2024.00035