Domain-adaptive medical literature neural machine translation model training method

The invention discloses a method for training a neural machine translation model of field-adaptive medical literature. The method comprises the following steps: 1) performing data preprocessing on data sets inside and outside a field; 2) based on the out-of-domain sub-lexical training set, carrying...

Full description

Saved in:
Bibliographic Details
Main Authors DONG SHOUBIN, HU JINLONG, YUAN HUA, ZHANG SHAOYUAN
Format Patent
LanguageChinese
English
Published 18.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention discloses a method for training a neural machine translation model of field-adaptive medical literature. The method comprises the following steps: 1) performing data preprocessing on data sets inside and outside a field; 2) based on the out-of-domain sub-lexical training set, carrying out dynamic decreasing training set training on an out-of-domain sub-lexical neural machine translation model; 3) using an improved data selection method to select a data set similar to the intra-domain parallel data set from the extra-domain data set to enhance the intra-domain data set; 4) training a small classifier or a language model based on the high-quality sub-lexical medical data set subjected to manual error correction to obtain training weights of sentence pairs of the sub-lexical training set in a domain, and adding the weights as training parameters into a continuous training process; and 5) in combination with the intra-domain sub-lexical training set and the training weight file obtained by processin
Bibliography:Application Number: CN202110332815