Domain-adaptive medical literature neural machine translation model training method

The invention discloses a method for training a neural machine translation model of field-adaptive medical literature. The method comprises the following steps: 1) performing data preprocessing on data sets inside and outside a field; 2) based on the out-of-domain sub-lexical training set, carrying...

Full description

Saved in:

Bibliographic Details
Main Authors	DONG SHOUBIN, HU JINLONG, YUAN HUA, ZHANG SHAOYUAN
Format	Patent
Language	Chinese English
Published	18.06.2021
Subjects	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention discloses a method for training a neural machine translation model of field-adaptive medical literature. The method comprises the following steps: 1) performing data preprocessing on data sets inside and outside a field; 2) based on the out-of-domain sub-lexical training set, carrying out dynamic decreasing training set training on an out-of-domain sub-lexical neural machine translation model; 3) using an improved data selection method to select a data set similar to the intra-domain parallel data set from the extra-domain data set to enhance the intra-domain data set; 4) training a small classifier or a language model based on the high-quality sub-lexical medical data set subjected to manual error correction to obtain training weights of sentence pairs of the sub-lexical training set in a domain, and adding the weights as training parameters into a continuous training process; and 5) in combination with the intra-domain sub-lexical training set and the training weight file obtained by processin
Bibliography:	Application Number: CN202110332815