The extraction of text/graphs from degraded documents

This paper presents a method for improving the quality of degraded documents by noise removal and text enhancing. Histogram of a degraded document is analyzed to find out the approximate ranges of gray-value for text-, graph-, (i.e. photographs), and background-pixels. After the graph-pixels are ide...

Full description

Saved in:
Bibliographic Details
Published in10th International Multimedia Modelling Conference, 2004. Proceedings pp. 181 - 186
Main Authors Shwu-Huey Yen, Yi-Jen Chen, Hui-Jen Lin, Chia-Jen Wang
Format Conference Proceeding
LanguageEnglish
Published IEEE 2004
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents a method for improving the quality of degraded documents by noise removal and text enhancing. Histogram of a degraded document is analyzed to find out the approximate ranges of gray-value for text-, graph-, (i.e. photographs), and background-pixels. After the graph-pixels are identified, they are replaced by the background pixels. Agent-growing method described by S. H. Yen and M. C. Shih (2000) is then applied to smooth the noisy background and a document with clear readable condition for text and background is obtained. At last, graph pixels are recovered to get the final result such that the degraded document now has the text in much better quality and photographs preserved if there is any. Experiments to verify the efficacy of the proposed method and comparison to some existing techniques are also presented.
ISBN:0769520847
9780769520841
DOI:10.1109/MULMM.2004.1264984