Text line extraction from handwritten document pages using spiral run length smearing algorithm

Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the...

Full description

Saved in:
Bibliographic Details
Published in2012 International Conference on Communications, Devices and Intelligent Systems (CODIS) pp. 616 - 619
Main Authors Malakar, S., Halder, S., Sarkar, R., Das, N., Basu, S., Nasipuri, M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the present work, a new text line extraction technique based on Spiral Run Length Smearing Algorithm (SRLSA) is reported. Firstly, digitized document image is partitioned into a number of vertical fragments of equal width. Then all the text line segments present in these fragments are identified by applying SRLSA. Finally, the neighboring text line segments are analyzed and merged (if necessary) to place them inside the same text line boundary in which they actually belong. For experimental purpose, the technique is tested on CMATERdb1.1.1 and CMATERdb1.2.1 databases. The present technique extracts 87.09% and 89.35% text lines successfully from the said databases respectively.
ISBN:9781467346993
1467346993
DOI:10.1109/CODIS.2012.6422278