PP-StructureV2: A Stronger Document Analysis System
A large amount of document data exists in unstructured form such as raw images without any text information. Designing a practical document image analysis system is a meaningful but challenging task. In previous work, we proposed an intelligent document analysis system PP-Structure. In order to furt...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
11.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A large amount of document data exists in unstructured form such as raw
images without any text information. Designing a practical document image
analysis system is a meaningful but challenging task. In previous work, we
proposed an intelligent document analysis system PP-Structure. In order to
further upgrade the function and performance of PP-Structure, we propose
PP-StructureV2 in this work, which contains two subsystems: Layout Information
Extraction and Key Information Extraction. Firstly, we integrate Image
Direction Correction module and Layout Restoration module to enhance the
functionality of the system. Secondly, 8 practical strategies are utilized in
PP-StructureV2 for better performance. For Layout Analysis model, we introduce
ultra light-weight detector PP-PicoDet and knowledge distillation algorithm FGD
for model lightweighting, which increased the inference speed by 11 times with
comparable mAP. For Table Recognition model, we utilize PP-LCNet, CSP-PAN and
SLAHead to optimize the backbone module, feature fusion module and decoding
module, respectively, which improved the table structure accuracy by 6\% with
comparable inference speed. For Key Information Extraction model, we introduce
VI-LayoutXLM which is a visual-feature independent LayoutXLM architecture,
TB-YX sorting algorithm and U-DML knowledge distillation algorithm, which
brought 2.8\% and 9.1\% improvement respectively on the Hmean of Semantic
Entity Recognition and Relation Extraction tasks. All the above mentioned
models and code are open-sourced in the GitHub repository PaddleOCR. |
---|---|
DOI: | 10.48550/arxiv.2210.05391 |