ZeUS: An Unified Training Framework for Constrained Neural Machine Translation

Unlike general translation, constrained translation necessitates the proper use of predefined restrictions, such as specific terminologies and entities, during the translation process. However, current neural machine translation (NMT) models exhibit proficient performance solely in the domains of ge...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 12; pp. 124695 - 124704
Main Author Yang, Murun
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Unlike general translation, constrained translation necessitates the proper use of predefined restrictions, such as specific terminologies and entities, during the translation process. However, current neural machine translation (NMT) models exhibit proficient performance solely in the domains of general translation or constrained translation. In this work, the author introduces the zero-shot unified constrained translation training framework, which adopts a novel approach of transforming constraints into textual explanations, thereby harmonizing the tasks of constrained translation with general translation. Furthermore, the author discovers the pivotal role of constructing synthetic data for domain-specific constrained translation in enhancing the model's performance on constrained translation tasks. To this end, the author utilizes large language models (LLMs) to generate domain-specific synthetic data for constrained translation. Experiments across four datasets and four translation directions, incorporating both general and constrained translations, demonstrate that models trained with the proposed framework and synthetic data achieve superior translation quality and constraint satisfaction rates, surpassing several baseline models in both general and contrained translation. Notably, ZeUS also exhibits significant advantages over multitask learning in constrained translation, with an average improvement of 7.25 percentage points in translation satisfaction rate (TSR) and 8.50 percentage points in translation completeness (TC).
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3454510