Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

This paper presents a new method - adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6149 - 6153
Main Authors	Peng, Baolin, Li, Xiujun, Gao, Jianfeng, Liu, Jingjing, Chen, Yun-Nung, Wong, Kam-Fai
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2018
Subjects	adversarial learning Gallium nitride Learning (artificial intelligence) Motion pictures Natural languages policy learning reinforcement learning reward function Task analysis task-completion dialogue Training Trajectory
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents a new method - adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.
ISSN:	2379-190X
DOI:	10.1109/ICASSP.2018.8461918