Method and apparatus for formatting OCR text
Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than...
Saved in:
Main Authors | , |
---|---|
Format | Patent |
Language | English |
Published |
20.06.2002
|
Edition | 7 |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size). |
---|---|
Bibliography: | Application Number: US20000738320 |