Text line extraction from handwritten document pages using spiral run length smearing algorithm
Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the...
Saved in:
Published in | 2012 International Conference on Communications, Devices and Intelligent Systems (CODIS) pp. 616 - 619 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Extraction of text lines from document images is one of the important steps in the process of an Optical Character Recognition (OCR) system. In case of handwritten document images, presence of skewed, touching or overlapping text line(s) makes this process a real challenge to the researcher. In the present work, a new text line extraction technique based on Spiral Run Length Smearing Algorithm (SRLSA) is reported. Firstly, digitized document image is partitioned into a number of vertical fragments of equal width. Then all the text line segments present in these fragments are identified by applying SRLSA. Finally, the neighboring text line segments are analyzed and merged (if necessary) to place them inside the same text line boundary in which they actually belong. For experimental purpose, the technique is tested on CMATERdb1.1.1 and CMATERdb1.2.1 databases. The present technique extracts 87.09% and 89.35% text lines successfully from the said databases respectively. |
---|---|
ISBN: | 9781467346993 1467346993 |
DOI: | 10.1109/CODIS.2012.6422278 |