A hierarchical and scalable model for contemporary document image segmentation
In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/quantization without a...
Saved in:
Published in | Pattern analysis and applications : PAA Vol. 16; no. 4; pp. 679 - 693 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
London
Springer London
01.11.2013
Springer Springer Verlag |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby FinerReader. We also get promising results with our ad detection system on a large set of complex layout testing images. |
---|---|
ISSN: | 1433-7541 1433-755X |
DOI: | 10.1007/s10044-012-0282-x |