Text and Non-text Separation in Scanned Color-Official Documents

Official documents consist of text and non-textual elements such as logo, stamp, and signature. Separation of these elements from a scanned document plays a significant role in document image retrieval, recognition, and verification. This paper presents a novel scheme to separate text and non-text e...

Full description

Saved in:
Bibliographic Details
Published inComputer Vision, Graphics, and Image Processing pp. 231 - 242
Main Authors Nandedkar, Amit Vijay, Mukherjee, Jayanta, Sural, Shamik
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing 01.01.2017
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Official documents consist of text and non-textual elements such as logo, stamp, and signature. Separation of these elements from a scanned document plays a significant role in document image retrieval, recognition, and verification. This paper presents a novel scheme to separate text and non-text elements of official documents using part-based features. In this work, we exploit the fact that intensity distributions of text and non-text elements in HSV color space are of distinctive nature. A new approach to compute part-based features using S and V channels is proposed. The classification of text and non-text components is performed based on majority voting scheme and K-approximate nearest neighbors. The knowledge base acquired during training is indexed using kD-tree indexing scheme. Subsequently, the method is extended for detection of logo, stamp, and signature. Experimental results show the effectiveness of the proposed approach.
ISBN:9783319681238
3319681230
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-68124-5_20