WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Weber, Maurice, Siebenschuh, Carlo, Butler, Rory, Alexandrov, Anton, Thanner, Valdemar, Tsolakis, Georgios, Jabbar, Haris, Foster, Ian, Li, Bo, Stevens, Rick, Zhang, Ce
Year of Publication 15.12.2023
Year of Publication 15.12.2023
Get full text
Journal Article
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Weber, Maurice, Siebenschuh, Carlo, Butler, Rory, Alexandrov, Anton, Thanner, Valdemar, Tsolakis, Georgios, Jabbar, Haris, Foster, Ian, Li, Bo, Stevens, Rick, Zhang, Ce
Published in arXiv.org (15.12.2023)
Get full text
Published in arXiv.org (15.12.2023)
Paper