Integrating Pretrained Language Model for Dialogue Policy Evaluation

Reinforcement Learning (RL) has been witnessed its potential for training a dialogue policy agent towards maximizing the accumulated rewards given from users. However, the reward can be very sparse for it is usually only provided at the end of a dialog session, which causes unaffordable interaction...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 6692 - 6696
Main Authors	Wang, Hongru, Wang, Huimin, Wang, Zezhong, Wong, Kam-Fai
Format	Conference Proceeding
Language	English
Published	IEEE 23.05.2022
Subjects	Acoustics Bit error rate Conferences Dialogue Policy Learning Pre-trained Language Model Predictive models Reinforcement learning Reward Shaping Signal processing Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Reinforcement Learning (RL) has been witnessed its potential for training a dialogue policy agent towards maximizing the accumulated rewards given from users. However, the reward can be very sparse for it is usually only provided at the end of a dialog session, which causes unaffordable interaction requirements for an acceptable dialog agent. Distinguished from many efforts dedicated to optimizing the policy and recovering the reward alternatively which suffers from easily getting stuck in local optima and model collapse, we decompose the adversarial training into two steps: 1) we integrate a pre-trained language model as a discriminator to judge whether the current system action is good enough for the last user action (i.e., next action prediction); 2) the discriminator gives and extra local dense reward to guide the agent's exploration. The experimental result demonstrates that our method significantly improves the complete rate (4.4%) and success rate ( 8.0%) of the dialogue system.
ISSN:	2379-190X
DOI:	10.1109/ICASSP43922.2022.9747593