Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models
Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models dis...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
20.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent models for unsupervised representation learning of text have employed
a number of techniques to improve contextual word representations but have put
little focus on discourse-level representations. We propose CONPONO, an
inter-sentence objective for pretraining language models that models discourse
coherence and the distance between sentences. Given an anchor sentence, our
model is trained to predict the text k sentences away using a sampled-softmax
objective where the candidates consist of neighboring sentences and sentences
randomly sampled from the corpus. On the discourse representation benchmark
DiscoEval, our model improves over the previous state-of-the-art by up to 13%
and on average 4% absolute across 7 tasks. Our model is the same size as
BERT-Base, but outperforms the much larger BERT- Large model and other more
recent approaches that incorporate discourse. We also show that CONPONO yields
gains of 2%-6% absolute even for tasks that do not explicitly evaluate
discourse: textual entailment (RTE), common sense reasoning (COPA) and reading
comprehension (ReCoRD). |
---|---|
DOI: | 10.48550/arxiv.2005.10389 |