Enhancing Deep Reinforcement Learning with Executable Specifications

Deep reinforcement learning (DRL) has become a dominant paradigm for using deep learning to carry out tasks where complex policies are learned for reactive systems. However, these policies are "black-boxes", e.g., opaque to humans and known to be susceptible to bugs. For example, it is har...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) pp. 213 - 217
Main Author	Yerushalmi, Raz
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2023
Subjects	Deep learning deep reinforcement learning domain expertise Machine learning Medical services Reinforcement learning rule-based specifications Safety Scalability scenario-based modeling Task analysis Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep reinforcement learning (DRL) has become a dominant paradigm for using deep learning to carry out tasks where complex policies are learned for reactive systems. However, these policies are "black-boxes", e.g., opaque to humans and known to be susceptible to bugs. For example, it is hard - if not impossible - to guarantee that the trained DRL agent adheres to specific safety and fairness properties that may be required. This doctoral dissertation's first and primary contribution is a novel approach to developing DRL agents, which will improve the DRL training process by pushing the learned policy toward high performance on its main task and compliance with such safety and fairness properties, guaranteeing a high probability of compliance while not compromising the performance of the resulting agent. The approach is realized by incorporating domain-specific knowledge captured as key properties defined by domain experts directly into the DRL optimization process while leveraging behavioral languages that are natural to the domain experts. We have validated the proposed approach by extending the AI-Gym Python framework [1] for training DRL agents and integrating it with the BP-Py framework [2] for specifying scenario-based models [3] in a way that allows scenario objects to affect the training process through reward and cost functions, demonstrating dramatic improvement in the safety and performance of the agent. In addition, we have validated the resulting DRL agents using the Marabou verifier [4], confirming that the resulting agents indeed comply (in full) with the required safety and fairness properties. We have applied the approach, training DRL agents for use cases from network communication and robotic navigation domains, exhibiting strong results. A second contribution of this doctoral dissertation is to develop and leverage probabilistic verification methods for deep neural networks to overcome the current scalability limitations of neural network verification technology, limiting the applicability of verification to practical DRL agents. We carried out an initial validation of the concept in the domain of image classification, showing promising results.
ISSN:	2574-1934
DOI:	10.1109/ICSE-Companion58688.2023.00058