Dialog Inpainting: Turning Documents into Dialogs
Many important questions (e.g. "How to eat healthier?") require conversation to establish context and explore in depth. However, conversational question answering (ConvQA) systems have long been stymied by scarce training data that is expensive to collect. To address this problem, we propo...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.05.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Many important questions (e.g. "How to eat healthier?") require conversation
to establish context and explore in depth. However, conversational question
answering (ConvQA) systems have long been stymied by scarce training data that
is expensive to collect. To address this problem, we propose a new technique
for synthetically generating diverse and high-quality dialog data: dialog
inpainting. Our approach takes the text of any document and transforms it into
a two-person dialog between the writer and an imagined reader: we treat
sentences from the article as utterances spoken by the writer, and then use a
dialog inpainter to predict what the imagined reader asked or said in between
each of the writer's utterances. By applying this approach to passages from
Wikipedia and the web, we produce WikiDialog and WebDialog, two datasets
totalling 19 million diverse information-seeking dialogs -- 1,000x larger than
the largest existing ConvQA dataset. Furthermore, human raters judge the answer
adequacy and conversationality of WikiDialog to be as good or better than
existing manually-collected datasets. Using our inpainted data to pre-train
ConvQA retrieval systems, we significantly advance state-of-the-art across
three benchmarks (QReCC, OR-QuAC, TREC CAsT) yielding up to 40% relative gains
on standard evaluation metrics. |
---|---|
DOI: | 10.48550/arxiv.2205.09073 |