HMM-Based Lexicon-Driven and Lexicon-Free Word Recognition for Online Handwritten Indic Scripts

Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts. In this paper, we address this problem specifically for two major Indic scripts-Devanagari and Tamil. In contrast to previous approaches, the techniques we propose a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 34; no. 4; pp. 670 - 682
Main Authors	Bharath, A., Madhvanath, S.
Format	Journal Article
Language	English
Published	Los Alamitos, CA IEEE 01.04.2012 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Applied sciences Artificial intelligence Automatic Data Processing bag of symbols Character recognition Computer science; control theory; systems Databases, Factual Devanagari Exact sciences and technology Feature extraction Handwriting Handwriting recognition Hidden Markov models India Ink Internet lexicon driven lexicon free Markov Chains Online handwriting recognition Pattern Recognition, Automated - methods Pattern recognition. Digital image processing. Computational geometry Reading Scripts Shape symbol order variation Tamil word recognition Writing India Online handwriting recognition lexicon driven Word Symbol Lexicon Model driven architecture Markov model Data driven modelling Pattern recognition Character recognition Tamil lexicon free Typography symbol order variation Handwriting recognition Optical character recognition Devanagari word recognition Hidden Markov model Phonetics bag of symbols Pruning(tree) Natural language Manuscript character
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Research for recognizing online handwritten words in Indic scripts is at its early stages when compared to Latin and Oriental scripts. In this paper, we address this problem specifically for two major Indic scripts-Devanagari and Tamil. In contrast to previous approaches, the techniques we propose are largely data driven and script independent. We propose two different techniques for word recognition based on Hidden Markov Models (HMM): lexicon driven and lexicon free. The lexicon-driven technique models each word in the lexicon as a sequence of symbol HMMs according to a standard symbol writing order derived from the phonetic representation. The lexicon-free technique uses a novel Bag-of-Symbols representation of the handwritten word that is independent of symbol order and allows rapid pruning of the lexicon. On handwritten Devanagari word samples featuring both standard and nonstandard symbol writing orders, a combination of lexicon-driven and lexicon-free recognizers significantly outperforms either of them used in isolation. In contrast, most Tamil word samples feature the standard symbol order, and the lexicon-driven recognizer outperforms the lexicon free one as well as their combination. The best recognition accuracies obtained for 20,000 word lexicons are 87.13 percent for Devanagari when the two recognizers are combined, and 91.8 percent for Tamil using the lexicon-driven technique.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2011.234