INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, PROGRAM, AND DOCUMENT READING SYSTEM

To provide an information processing device and a method capable of identifying an item name of each item of a document and an item value with respect to character data subjected to character recognition on an image of an atypical document.SOLUTION: An information processing device 100 includes: a s...

Full description

Saved in:
Bibliographic Details
Main Authors HIRAKI KENJI, OCHIAI HIRONORI, NAKADA HIROYUKI, SONEDA TOSHIYUKI, TORIYA SOICHIRO
Format Patent
LanguageEnglish
Japanese
Published 30.01.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:To provide an information processing device and a method capable of identifying an item name of each item of a document and an item value with respect to character data subjected to character recognition on an image of an atypical document.SOLUTION: An information processing device 100 includes: a storage unit which stores item name registration information and a mapping model representing correlation values between a position of an item name of each item of a document and a position of an item value corresponding to the item name; an acquisition unit which acquires image data of an atypical document; a character recognition unit which recognizes each character on the image data; an item name identification unit which identifies the item name of the document on the basis of recognized character information and the item name registration information; a range identification unit which identifies a range where item values having correlation values exceeding a threshold exist from item values corresponding to the identified item names; an item value identification unit which extracts each of characters belonging to the identified range and identifies item values corresponding to the identified item names on the basis of each of the characters; and an output information generation unit which generates output information for outputting information on the identified item names and information on the identified item values in association with each other.SELECTED DRAWING: Figure 2 【課題】非定型の帳票の画像上の文字認識された文字データに対して、帳票の各項目の項目名と項目値を特定することができる情報処理装置及び方法を提供する。【解決手段】情報処理装置100は、帳票の各項目の項目名と対応する項目値との位置間の相関値を表すマッピングモデルと項目名登録情報を記憶する記憶部と、非定型の帳票の画像データを取得する取得部と、画像データ上の各文字を認識する文字認識部と、認識された文字情報と項目名登録情報とに基づいて、帳票の項目名を特定する項目名特定部と、マッピングモデルを用いて、特定された項目名に対応する項目値から、相関値が閾値以上の項目値が位置する範囲を特定する範囲特定部と、特定された範囲に属する各文字を抽出し、当該各文字に基づいて対応する項目値を特定する項目値特定部と、特定された項目名、項目値を示す情報とを関連付けて出力する出力情報を生成する出力情報生成部と、を備える。【選択図】図2
Bibliography:Application Number: JP20180137833