EDMSpell: Incorporating the error discriminator mechanism into chinese spelling correction for the overcorrection problem

Chinese spelling correction (CSC), which is designed to detect and correct typos in the text, is an important and challenging task that has attracted increasing attention. Current CSC models based on pre-trained language models (PLM) show powerful error correction capabilities, but cause the problem...

Full description

Saved in:
Bibliographic Details
Published inJournal of King Saud University. Computer and information sciences Vol. 35; no. 6; p. 101573
Main Authors Sheng, Lei, Xu, Zhenxing, Li, Xiaolong, Jiang, Zhansi
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.06.2023
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Chinese spelling correction (CSC), which is designed to detect and correct typos in the text, is an important and challenging task that has attracted increasing attention. Current CSC models based on pre-trained language models (PLM) show powerful error correction capabilities, but cause the problem that many correct characters are incorrectly corrected. The problem of overcorrection is particularly serious in real-world application scenarios. To alleviate this situation, we propose a novel post-processing model: EDMSpell, which utilizes two discriminators to post-process the corrected results. Specifically, the final decision to adopt this correction is made by discriminating whether the original sentence and the corrected sentence are correct or not. To verify its effectiveness, we conduct comprehensive experiments and ablation tests. Experiments on the SIGHAN15 benchmarks show that EDMSpell can considerably lower the false-positive rate of the model, with an average reduction of 5.4 points, while also improving the error correction F1 metric by an average of 1.4 points on nine models.
ISSN:1319-1578
2213-1248
DOI:10.1016/j.jksuci.2023.101573