MC-OCR Challenge 2021: End-to-end system to extract key information from Vietnamese Receipts
In the information age, how to quickly obtain information and extract key information from massive and complex re-sources has become challenging. Extracting information from scanned or captured document is one of the most demanding process in many areas such as finance, accounting, and taxation. The...
Saved in:
Published in | 2021 RIVF International Conference on Computing and Communication Technologies (RIVF) pp. 1 - 5 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
19.08.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In the information age, how to quickly obtain information and extract key information from massive and complex re-sources has become challenging. Extracting information from scanned or captured document is one of the most demanding process in many areas such as finance, accounting, and taxation. The current achievement in the computer vision field has shown a substantial improvement in the field of Optical Character Recognition (OCR), including text detection and recognition tasks. However, there are two challenges for current OCR. The first one is the quality of the input data which is captured by mobile phone. The quality is greatly affected by external factors like light condition, dynamic environment or blurry content. Secondly, Key Information Extraction (KIE) from documents, which is a downstream task of OCR, had been a largely under explored domain because the input documents have not only textual features extracting from OCR systems but also semantic visual features which are not fully utilized and play a critical role in KIE. In this paper, we propose an end-to-end system based on several state-of-the-art models from both computer vision and natural language processing areas to deal with the Mobile captured receipts OCR (MC-OCR) challenge, including two tasks: (1) evaluating the quality of the captured receipt, and (2) recognizing required fields of the receipt. Our code is publicly available at https://github.com/ndcuong9/MC_OCR |
---|---|
DOI: | 10.1109/RIVF51545.2021.9642083 |