A survey of document image word spotting techniques

•This work reviews the word spotting methods for document indexing.•The nature of texts addressed by word spotting techniques is analyzed.•The core steps that compose a word spotting system are thoroughly explored.•Several boosting mechanisms which enhance the retrieved results are examined.•Results...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 68; pp. 310 - 332
Main Authors Giotis, Angelos P., Sfikas, Giorgos, Gatos, Basilis, Nikou, Christophoros
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.08.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•This work reviews the word spotting methods for document indexing.•The nature of texts addressed by word spotting techniques is analyzed.•The core steps that compose a word spotting system are thoroughly explored.•Several boosting mechanisms which enhance the retrieved results are examined.•Results achieved by the state of the art imply that there are still goals to be reached. Vast collections of documents available in image format need to be indexed for information retrieval purposes. In this framework, word spotting is an alternative solution to optical character recognition (OCR), which is rather inefficient for recognizing text of degraded quality and unknown fonts usually appearing in printed text, or writing style variations in handwritten documents. Over the past decade there has been a growing interest in addressing document indexing using word spotting which is reflected by the continuously increasing number of approaches. However, there exist very few comprehensive studies which analyze the various aspects of a word spotting system. This work aims to review the recent approaches as well as fill the gaps in several topics with respect to the related works. The nature of texts and inherent challenges addressed by word spotting methods are thoroughly examined. After presenting the core steps which compose a word spotting system, we investigate the use of retrieval enhancement techniques based on relevance feedback which improve the retrieved results. Finally, we present the datasets which are widely used for word spotting, we describe the evaluation standards and measures applied for performance assessment and discuss the results achieved by the state of the art.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2017.02.023