A hierarchical and scalable model for contemporary document image segmentation

In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/quantization without a...

Full description

Saved in:
Bibliographic Details
Published inPattern analysis and applications : PAA Vol. 16; no. 4; pp. 679 - 693
Main Authors Ouji, Asma, Leydier, Yann, LeBourgeois, Frank
Format Journal Article
LanguageEnglish
Published London Springer London 01.11.2013
Springer
Springer Verlag
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we introduce a novel color segmentation approach robust against digitization noise and adapted to contemporary document images. This system is scalable, hierarchical, versatile and completely automated, i.e. user independent. It proposes an adaptive binarization/quantization without any penalizing information loss. This model may be used for many purposes. For instance, we rely on it to carry out the first steps leading to advertisement recognition in document images. Furthermore, the color segmentation output is used to localize text areas and enhance optical character recognition (OCR) performances. We held tests on a variety of magazine images to point up our contribution to the well-known OCR product Abby FinerReader. We also get promising results with our ad detection system on a large set of complex layout testing images.
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-012-0282-x