PARAGRAPH RECOGNITION IN AN OPTICAL CHARACTER RECOGNITION (OCR) PROCESS

An image processing apparatus for detecting paragraphs in a textual image includes an input component for receiving an input image in which textual lines and words have been identified and a page classification component for classifying the input image as a first or second page type. The apparatus a...

Full description

Saved in:
Bibliographic Details
Main Authors UZELAC, Aleksandar, GALIC, Sasa, RADAKOVIC, Bogdan
Format Patent
LanguageEnglish
French
German
Published 30.05.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:An image processing apparatus for detecting paragraphs in a textual image includes an input component for receiving an input image in which textual lines and words have been identified and a page classification component for classifying the input image as a first or second page type. The apparatus also includes a paragraph detection component for classifying all textual lines on the input image as a beginning paragraph line or a continuation paragraph line. The apparatus is also provided with a paragraph creation component for creating paragraphs that include textual lines between two successive beginning paragraph lines, including a first of the two successive beginning paragraph lines. The paragraphs that have been identified may be classified by the type of alignment they exhibit. For instance, paragraphs may be classified according to whether they are left aligned, right aligned, center aligned or justified.
Bibliography:Application Number: EP20110753918