A Novel Dataset for Known and Unknown Ancient Arabic Manuscripts

This paper presents a new dataset of Ancient Arabic-Islamic Manuscripts to detect unknown manuscripts and classify them from the known manuscripts. Unknown Manuscripts are identified as those that have been affected badly by human or natural forces, such as humidity, temperature, and air pollution,...

Full description

Saved in:
Bibliographic Details
Published in2022 20th International Conference on Language Engineering (ESOLEC) Vol. 20; pp. 60 - 65
Main Authors Al-homed, Lutfieh S., Jambi, Kamal M., Al-Barhamtoshy, Hassanin M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 12.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents a new dataset of Ancient Arabic-Islamic Manuscripts to detect unknown manuscripts and classify them from the known manuscripts. Unknown Manuscripts are identified as those that have been affected badly by human or natural forces, such as humidity, temperature, and air pollution, which degraded their quality and missed their identification information, such as the title, author, and date of the manuscripts. Thus, The Known Manuscripts are characterized by having a known title, author, etc. Recognizing the unknown manuscripts is essential to further the analysis process, facilitate information extraction from such degraded manuscripts, enable their indexing, and make them easily accessed and retrieved. The objectives of the constructed dataset are as follows: 1) Collect a set of known and unknown manuscripts of similar forms and highlight the characteristics of the unknown manuscripts. 2) Promote the automatic detection and recognition of unknown manuscripts. 3) Formulate the problem of recognizing unknown manuscripts as a supervised machine-learning problem, and boost this recognition with the advances in machine learning and deep learning techniques. A total of 108 manuscripts were collected, distributed equally by the known and unknown categories. The preliminary results for classifying and recognizing unknown manuscripts showed that using a decision tree classifier achieved an accuracy of 88% in classifying unknown manuscripts.
DOI:10.1109/ESOLEC54569.2022.10009168