Text line extraction from handwritten document pages using spiral run length smearing algorithm

Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the...

Full description

Saved in:

Bibliographic Details
Published in	2012 International Conference on Communications, Devices and Intelligent Systems (CODIS) pp. 616 - 619
Main Authors	Malakar, S., Halder, S., Sarkar, R., Das, N., Basu, S., Nasipuri, M.
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2012
Subjects	CMATERdb Decision support systems Handwritten document pages Intelligent systems OCR SRLSA Text line extraction Vertical partitioning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the present work, a new text line extraction technique based on Spiral Run Length Smearing Algorithm (SRLSA) is reported. Firstly, digitized document image is partitioned into a number of vertical fragments of equal width. Then all the text line segments present in these fragments are identified by applying SRLSA. Finally, the neighboring text line segments are analyzed and merged (if necessary) to place them inside the same text line boundary in which they actually belong. For experimental purpose, the technique is tested on CMATERdb1.1.1 and CMATERdb1.2.1 databases. The present technique extracts 87.09% and 89.35% text lines successfully from the said databases respectively.
ISBN:	9781467346993 1467346993
DOI:	10.1109/CODIS.2012.6422278