LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents

Digital documents often contain images and scanned text. Parsing such visually-rich documents is a core task for work-flow automation, but it remains challenging since most documents do not encode explicit layout information, e.g., how characters and words are grouped into boxes and ordered into lar...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) pp. 3599 - 3609
Main Authors Mathur, Puneet, Jain, Rajiv, Mehra, Ashutosh, Gu, Jiuxiang, Dernoncourt, Franck, N, Anandhavelu, Tran, Quan, Kaynig-Fittkau, Verena, Nenkova, Ani, Manocha, Dinesh, Morariu, Vlad I.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Digital documents often contain images and scanned text. Parsing such visually-rich documents is a core task for work-flow automation, but it remains challenging since most documents do not encode explicit layout information, e.g., how characters and words are grouped into boxes and ordered into larger semantic entities. Current state-of-the-art layout extraction methods are challenged by such documents as they rely on word sequences to have correct reading order and do not exploit their hierarchical structure. We propose LayerDoc, an approach that uses visual features, textual semantics, and spatial coordinates along with constraint inference to extract the hierarchical layout structure of documents in a bottom-up layer-wise fashion. LayerDoc recursively groups smaller regions into larger semantic elements in 2D to infer complex nested hierarchies. Experiments show that our approach outperforms competitive baselines by 10-15% on three diverse datasets of forms and mobile app screen layouts for the tasks of spatial region classification, higher-order group identification, layout hierarchy extraction, reading order detection, and word grouping.
ISSN:2642-9381
DOI:10.1109/WACV56688.2023.00360