User-Defined Expected Error Rate in OCR Postprocessing by Means of Automatic Threshold Estimation
In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a "transformation cost") is usually o...
Saved in:
Published in | 2010 12th International Conference on Frontiers in Handwriting Recognition pp. 405 - 409 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.11.2010
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a "transformation cost") is usually obtained, reflecting the likelihood (or its inverse) that the string of OCR hypotheses belongs to the model. Using a threshold on this index (or cost) to reject the less reliable hypotheses, a variable level of expected accuracy can be imposed on the output. It is much more convenient for the user the ability to "fix" at an acceptable level the expected error rate instead of having to deal with an arbitrary threshold. Of course, the result will always be high reject rates for difficult tasks and lower reject rates for easier tasks. |
---|---|
ISBN: | 1424483530 9781424483532 |
DOI: | 10.1109/ICFHR.2010.126 |