User-Defined Expected Error Rate in OCR Postprocessing by Means of Automatic Threshold Estimation

In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a "transformation cost") is usually o...

Full description

Saved in:

Bibliographic Details
Published in	2010 12th International Conference on Frontiers in Handwriting Recognition pp. 405 - 409
Main Authors	Navarro-Cerdan, J Ramon, Arlandis, Joaquim, Perez-Cortes, Juan-Carlos, Llobet, Rafael
Format	Conference Proceeding
Language	English
Published	IEEE 01.11.2010
Subjects	Biological system modeling Error analysis Estimation Handwriting recognition Helium Optical character recognition software Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this work, a method for the automatic estimation of a threshold that allows the user of an OCR system to define an expected error rate is presented. When the OCR output is post-processed using a language model, a probability, a reliability index (or a "transformation cost") is usually obtained, reflecting the likelihood (or its inverse) that the string of OCR hypotheses belongs to the model. Using a threshold on this index (or cost) to reject the less reliable hypotheses, a variable level of expected accuracy can be imposed on the output. It is much more convenient for the user the ability to "fix" at an acceptable level the expected error rate instead of having to deal with an arbitrary threshold. Of course, the result will always be high reject rates for difficult tasks and lower reject rates for easier tasks.
ISBN:	1424483530 9781424483532
DOI:	10.1109/ICFHR.2010.126