Text data generation method for training English grammar error correction model

The invention relates to the technical field of data generation, in particular to a text data generation method for training an English grammar error correction model. The text data generation methodcomprises the following steps: (1) introducing sentence error quantity; (2) determining an error type...

Full description

Saved in:
Bibliographic Details
Main Authors QIN LONG, CHEN JIN, XU SHUYAO
Format Patent
LanguageChinese
English
Published 01.11.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The invention relates to the technical field of data generation, in particular to a text data generation method for training an English grammar error correction model. The text data generation methodcomprises the following steps: (1) introducing sentence error quantity; (2) determining an error type; (3) carrying out corresponding Word Tree replacement according to the error type; and (4) utilizing WMT11 single-language data and One-Billion-Word single-language data to generate grammar correction model pre-training data. According to the text data generation method, the effect of the grammarcorrection model is effectively improved. 本发明涉及数据生成技术领域,尤其是一种用于训练英语语法改错模型的文本数据生成方法,其步骤为:(1)句子错误数量的引入;(2)确定错误类型;(3)根据错误类型进行相应的Word Tree替换;(4)利用WMT11单语言数据与One-Billion-Word单语言数据生成作为语法改错模型预训练数据,本发明有效提高语法改错模型的效果。
Bibliography:Application Number: CN201910719334