Text and Non-text Separation in Scanned Color-Official Documents
Official documents consist of text and non-textual elements such as logo, stamp, and signature. Separation of these elements from a scanned document plays a significant role in document image retrieval, recognition, and verification. This paper presents a novel scheme to separate text and non-text e...
Saved in:
Published in | Computer Vision, Graphics, and Image Processing pp. 231 - 242 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
01.01.2017
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Official documents consist of text and non-textual elements such as logo, stamp, and signature. Separation of these elements from a scanned document plays a significant role in document image retrieval, recognition, and verification. This paper presents a novel scheme to separate text and non-text elements of official documents using part-based features. In this work, we exploit the fact that intensity distributions of text and non-text elements in HSV color space are of distinctive nature. A new approach to compute part-based features using S and V channels is proposed. The classification of text and non-text components is performed based on majority voting scheme and K-approximate nearest neighbors. The knowledge base acquired during training is indexed using kD-tree indexing scheme. Subsequently, the method is extended for detection of logo, stamp, and signature. Experimental results show the effectiveness of the proposed approach. |
---|---|
ISBN: | 9783319681238 3319681230 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-68124-5_20 |