EDMSpell: Incorporating the error discriminator mechanism into chinese spelling correction for the overcorrection problem
Chinese spelling correction (CSC), which is designed to detect and correct typos in the text, is an important and challenging task that has attracted increasing attention. Current CSC models based on pre-trained language models (PLM) show powerful error correction capabilities, but cause the problem...
Saved in:
Published in | Journal of King Saud University. Computer and information sciences Vol. 35; no. 6; p. 101573 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.06.2023
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Chinese spelling correction (CSC), which is designed to detect and correct typos in the text, is an important and challenging task that has attracted increasing attention. Current CSC models based on pre-trained language models (PLM) show powerful error correction capabilities, but cause the problem that many correct characters are incorrectly corrected. The problem of overcorrection is particularly serious in real-world application scenarios. To alleviate this situation, we propose a novel post-processing model: EDMSpell, which utilizes two discriminators to post-process the corrected results. Specifically, the final decision to adopt this correction is made by discriminating whether the original sentence and the corrected sentence are correct or not. To verify its effectiveness, we conduct comprehensive experiments and ablation tests. Experiments on the SIGHAN15 benchmarks show that EDMSpell can considerably lower the false-positive rate of the model, with an average reduction of 5.4 points, while also improving the error correction F1 metric by an average of 1.4 points on nine models. |
---|---|
ISSN: | 1319-1578 2213-1248 |
DOI: | 10.1016/j.jksuci.2023.101573 |