Text data generation method for training English grammar error correction model

The invention relates to the technical field of data generation, in particular to a text data generation method for training an English grammar error correction model. The text data generation methodcomprises the following steps: (1) introducing sentence error quantity; (2) determining an error type...

Full description

Saved in:

Bibliographic Details
Main Authors	QIN LONG, CHEN JIN, XU SHUYAO
Format	Patent
Language	Chinese English
Published	01.11.2019
Subjects	CALCULATING COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The invention relates to the technical field of data generation, in particular to a text data generation method for training an English grammar error correction model. The text data generation methodcomprises the following steps: (1) introducing sentence error quantity; (2) determining an error type; (3) carrying out corresponding Word Tree replacement according to the error type; and (4) utilizing WMT11 single-language data and One-Billion-Word single-language data to generate grammar correction model pre-training data. According to the text data generation method, the effect of the grammarcorrection model is effectively improved. 本发明涉及数据生成技术领域，尤其是一种用于训练英语语法改错模型的文本数据生成方法，其步骤为：(1)句子错误数量的引入；(2)确定错误类型；(3)根据错误类型进行相应的Word Tree替换；(4)利用WMT11单语言数据与One-Billion-Word单语言数据生成作为语法改错模型预训练数据，本发明有效提高语法改错模型的效果。
Bibliography:	Application Number: CN201910719334