Automatic processing of Historical Arabic Documents: A comprehensive Survey

•Challenges of automatic processing of historical Arabic documents (APHAD).•Classification of APHAD applications into four tasks: Data analysis, Writer classification, Data classification and Data retrieval.•For each application, a survey of existing approaches is presented.•For each application, th...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 100; pp. 107144 - 1:107144-17
Main Authors	Ibn Khedher, Mohamed, Jmila, Houda, El-Yacoubi, Mounim A.
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.04.2020 Elsevier
Subjects	Artificial Intelligence Computer Science Data retrieval Document and Text Processing Historical Arabic Documents Survey on Historical Arabic Documents Text analysis Text recognition Writer identification Historical Arabic Documents Data retrieval Text analysis Text recognition Survey on Historical Arabic Documents Writer identification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Challenges of automatic processing of historical Arabic documents (APHAD).•Classification of APHAD applications into four tasks: Data analysis, Writer classification, Data classification and Data retrieval.•For each application, a survey of existing approaches is presented.•For each application, the existing solutions are discussed and recommendations are suggested.•Existing datasets and softwares on APHAD applications are surveyed. Nowadays, there is a huge amount of Historical Arabic Documents (HAD) in the national libraries and archives around the world. Analyzing this type of data manually is a difficult and costly task. Thus, an automatic process is required to exploit these documents more rapidly. Processing historical documents is a recent research subject that has seen a remarkable growth in the last years. Processing Historical Arabic Documents is a particularly challenging problem. First, due to complicated nature of Arabic script compared to other scripts and second because the documents are ancient. This paper focuses on this difficult problem and provides a comprehensive survey of existing research work. First, we describe in detail the challenges making the automatic processing of Historical Arabic Documents a difficult task. Second, we classify this task into four applications of automatic processing of HAD: i) Analyze the document to extract the main text ii) Identify the writer of the document iii) Recognize some words or parts of the document in a reference dataset andiv) Retrieve and extract specific data from the document. For each application, existing approaches are surveyed and qualitatively described. Finally, we focus on available datasets and describe how they can be used in each application.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2019.107144