Investigating Recurrent Neural Networks for Diacritizing Arabic Text and Correcting Soft Spelling Mistakes

Modern Arabic language processing research aims to find efficient solutions to important problems such as adding diacritics to undiacritized text and correcting spelling mistakes. Some previous machine learning solutions solve these problems as one-to-one sequence transcription problems, which impos...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT) pp. 266 - 271
Main Authors Almajdoubah, Ahmad N., Abandah, Gheith A., Suvvagh, Ashraf E.
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.11.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Modern Arabic language processing research aims to find efficient solutions to important problems such as adding diacritics to undiacritized text and correcting spelling mistakes. Some previous machine learning solutions solve these problems as one-to-one sequence transcription problems, which imposes restrictions on the type of errors that can be corrected. This work investigates transformer and encoder-decoder recurrent neural network models taking these two problems as case studies. These models solve sequence to sequence transcription problems without restrictions on the input and output lengths. We investigated several alternatives and evaluated them on two benchmark datasets. We recommend an encoder-decoder model that provides excellent accuracy. The encoder part has an embedding layer and a long short-term memory (LSTM) layer. The decoder part has an embedding layer, an LSTM layer, Luong attention between the two parts, and a dense output layer with softmax activation.
DOI:10.1109/JEEIT53412.2021.9634126