Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the fo...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 8; pp. 264 - 280
Main Authors Rothe, Sascha, Narayan, Shashi, Severyn, Aliaksei
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.01.2020
MIT Press Journals, The
The MIT Press
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2, and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion.
Bibliography:Volume, 2020
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2307-387X
2307-387X
DOI:10.1162/tacl_a_00313