Online continuous multi-stroke Persian/Arabic character recognition by novel spatio-temporal features for digitizer pen devices

Nowadays, digitizer pens have become front end of many digital devices. The increasing use of this technology has necessitated the need for producing pen-based virtual keyboard systems. Despite attempts to create such systems in English, their absence for Persian/Arabic languages is an obvious defec...

Full description

Saved in:
Bibliographic Details
Published inNeural computing & applications Vol. 32; no. 8; pp. 3853 - 3872
Main Authors Valikhani, Sara, Abdali-Mohammadi, Fardin, Fathi, Abdolhossein
Format Journal Article
LanguageEnglish
Published London Springer London 01.04.2020
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Nowadays, digitizer pens have become front end of many digital devices. The increasing use of this technology has necessitated the need for producing pen-based virtual keyboard systems. Despite attempts to create such systems in English, their absence for Persian/Arabic languages is an obvious defect. The goal of this paper is presenting an online continuous Persian/Arabic character recognition method. A character in Persian/Arabic language is made of two types of signs or strokes: main body and delayed strokes (which may be zero or more sign). In this paper, a set of novel and discriminative spatial features are defined for these strokes. These features then are used in a novel algorithm to create a genetic programming-based decision tree called GPDT. The GPDT and spatio-temporal features are utilized by non-deterministic finite automata (NDFA) to recognize group-related strokes and related characters. The reason for using spatio-temporal features is the sameness of the main body of some Persian/Arabic letters (e.g., “ح، خ، ج، چ”). There are also two other issues related to recognizing Persian/Arabic letters: unknown number of delayed stroke segments and the sameness of delayed strokes placement, which are removed by using an NDFA. In fact, after identifying group of main body with the help of GPDT, each recognized stroke makes a move in NDFA to stop in a character state (final state on the end of a path in NDFA). The proposed algorithm recognizes continuous Persian/Arabic letters and digits with a 92.43% accuracy and isolated letters and digits with 97.52% accuracy.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-019-04225-6