Background variability modeling for statistical layout analysis

Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achieving high text line segmentation accuracy on the UW-III dataset. These algorithms rely on certain assumptions about document layouts, and...

Full description

Saved in:
Bibliographic Details
Published in2008 19th International Conference on Pattern Recognition pp. 1 - 4
Main Authors Shafait, F., van Beusekom, J., Keysers, D., Breuel, T.M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achieving high text line segmentation accuracy on the UW-III dataset. These algorithms rely on certain assumptions about document layouts, and fail when their underlying assumptions are not met. Also, they do not provide confidence scores for their output. These two problems limit the usefulness of general purpose layout analysis methods in large scale applications. In this contribution, we propose a statistically motivated model-based trainable layout analysis system that allows assumption-free adaptation to different layout types and produces likelihood estimates of the correctness of the computed page segmentation. The performance of our approach is tested on a subset of the Google 1000 books dataset where it achieved a text line segmentation accuracy of 98.4% on layouts where other general-purpose algorithms failed to do a correct segmentation.
ISBN:9781424421749
1424421748
ISSN:1051-4651
2831-7475
DOI:10.1109/ICPR.2008.4760964