The Power of Regular Expressions in Recognizing Dates and Epochs
The digitization of cultural heritage objects discovered over time is a process that allows librarians, historians and researchers to identify various common elements such as the lifestyle of the inhabitants, their traditions, the distribution of the population by areas, and many other such importan...
Saved in:
Published in | 2021 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI) pp. 1 - 3 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The digitization of cultural heritage objects discovered over time is a process that allows librarians, historians and researchers to identify various common elements such as the lifestyle of the inhabitants, their traditions, the distribution of the population by areas, and many other such important points of interest in the study of communities and human behavior throughout history. In order to perform such analyses, one of the most important characteristics to be analyzed is the identification in time of the events in which the actors were involved. Calendar data and epochs play an important role in this direction, but their use is not always easy because they can be expressed in various forms, or they can even be altered from the very data collection process. Therefore, before starting any analysis, the standardization of temporal characteristics is a key step to follow. Unfortunately, neither the use of Machine Translation techniques nor NLP libraries provide adequate support, because by using Machine Translation techniques the context is lost, and NLP libraries often provide support only for texts written in English that comply with the spelling rules of the language. Instead, regular expressions can be adapted to any language. By applying a series of regular expressions through which both calendar da-ta and epochs expressed in centuries and millennia can be identified, it has been possible to standardize approximately 95% of the 6,630 distinct records related to cultural assets published in digital format by the National Heritage Institute of Romania. |
---|---|
DOI: | 10.1109/ECAI52376.2021.9515139 |