User-Defined Expected Error Rate in OCR Postprocessing by Means of Automatic Threshold Estimation

In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a "transformation cost") is usually o...

Full description

Saved in:
Bibliographic Details
Published in2010 12th International Conference on Frontiers in Handwriting Recognition pp. 405 - 409
Main Authors Navarro-Cerdan, J Ramon, Arlandis, Joaquim, Perez-Cortes, Juan-Carlos, Llobet, Rafael
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a "transformation cost") is usually obtained, reflecting the likelihood (or its inverse) that the string of OCR hypotheses belongs to the model. Using a threshold on this index (or cost) to reject the less reliable hypotheses, a variable level of expected accuracy can be imposed on the output. It is much more convenient for the user the ability to "fix" at an acceptable level the expected error rate instead of having to deal with an arbitrary threshold. Of course, the result will always be high reject rates for difficult tasks and lower reject rates for easier tasks.
ISBN:1424483530
9781424483532
DOI:10.1109/ICFHR.2010.126