Multilingual Denoising Pre-training for Neural Machine Translation
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present —a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART o...
Saved in:
Published in | Transactions of the Association for Computational Linguistics Vol. 8; pp. 726 - 742 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.01.2020
MIT Press Journals, The The MIT Press |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present
—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al.,
). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task- specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training. |
---|---|
Bibliography: | Volume, 2020 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2307-387X 2307-387X |
DOI: | 10.1162/tacl_a_00343 |