Investigating Recurrent Neural Networks for Diacritizing Arabic Text and Correcting Soft Spelling Mistakes
Modern Arabic language processing research aims to find efficient solutions to important problems such as adding diacritics to undiacritized text and correcting spelling mistakes. Some previous machine learning solutions solve these problems as one-to-one sequence transcription problems, which impos...
Saved in:
Published in | 2021 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT) pp. 266 - 271 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Modern Arabic language processing research aims to find efficient solutions to important problems such as adding diacritics to undiacritized text and correcting spelling mistakes. Some previous machine learning solutions solve these problems as one-to-one sequence transcription problems, which imposes restrictions on the type of errors that can be corrected. This work investigates transformer and encoder-decoder recurrent neural network models taking these two problems as case studies. These models solve sequence to sequence transcription problems without restrictions on the input and output lengths. We investigated several alternatives and evaluated them on two benchmark datasets. We recommend an encoder-decoder model that provides excellent accuracy. The encoder part has an embedding layer and a long short-term memory (LSTM) layer. The decoder part has an embedding layer, an LSTM layer, Luong attention between the two parts, and a dense output layer with softmax activation. |
---|---|
DOI: | 10.1109/JEEIT53412.2021.9634126 |