Urdu ligature recognition using multi-level agglomerative hierarchical clustering

Optical character recognition (OCR) system holds great significance in human-machine interaction. OCR has been the subject of intensive research especially for Latin, Chinese and Japanese script. Comparatively, little work has been done for Urdu OCR, due to the complexities and segmentation errors a...

Full description

Saved in:

Bibliographic Details
Published in	Cluster computing Vol. 21; no. 1; pp. 503 - 514
Main Authors	Khan, Naila Habib, Adnan, Awais, Basar, Sadia
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2018 Springer Nature B.V
Subjects	Accuracy Algorithms Classification Cluster analysis Clustering Computer Communication Networks Computer Science Decision analysis Decision trees Dictionaries Discriminant analysis Handwriting Ideograph recognition Language policy Machine learning Neural networks Operating Systems Optical character recognition Processor Architectures R&D Research & development Segmentation Urdu language Pakistan Agglomerative Clustering Urdu Classification OCR
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Optical character recognition (OCR) system holds great significance in human-machine interaction. OCR has been the subject of intensive research especially for Latin, Chinese and Japanese script. Comparatively, little work has been done for Urdu OCR, due to the complexities and segmentation errors associated with its cursive script. This paper proposes an Urdu OCR system which aims at ligature-level recognition of Urdu text. This ligature based recognition approach overcomes the character-levelsegmentation problems associated with cursive scripts. A newly developed OCR algorithm is introduced that uses a semi-supervised multi-level clustering for categorization of the ligatures. Classification is performed using four machine learning techniques i.e. decision trees, linear discriminant analysis, naive Bayes and k-nearest neighbor (K-NN). The system was implemented and the results show 62, 61, 73 and 90% accuracy for decision tree, linear discriminant analysis, naive Bayes and K-NN respectively.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1386-7857 1573-7543
DOI:	10.1007/s10586-017-0916-2