Character segmentation for Nastaleeq URDU OCR: A review

Urdu Nastaleeq is a highly cursive, context sensitive language, written diagonally from top right to bottom left that makes it difficult to segment the partial word or a compete word into characters. Further due to stacking of characters, the segmentation at the character level is hard to perform. S...

Full description

Saved in:
Bibliographic Details
Published in2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) pp. 1489 - 1493
Main Authors Ganai, Aejaz Farooq, Lone, Faisal Rasheed
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Urdu Nastaleeq is a highly cursive, context sensitive language, written diagonally from top right to bottom left that makes it difficult to segment the partial word or a compete word into characters. Further due to stacking of characters, the segmentation at the character level is hard to perform. Some researchers have performed the ligature level segmentation and have succeeded to a great extent, but the accuracy of segmentation is still less and needs to improved. In this paper, the methodology for segmentation of Urdu nastaleeq at the character level is presented. The various challenges encountered during segmentation have been discussed in detail.
DOI:10.1109/ICEEOT.2016.7754931