Automatic Subject Cataloguing at the German National Library
The German National Library (DNB) began developing solutions for automatic subject cataloguing 15 years ago. The main reason for this was the huge and ever-growing number of digital media works that needed to be indexed. Today, the DNB uses open source algorithms and frameworks to assign various typ...
Saved in:
Published in | LIBER quarterly Vol. 35; no. 1; pp. 1 - 29 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
openjournals.nl
08.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The German National Library (DNB) began developing solutions for automatic subject cataloguing 15 years ago. The main reason for this was the huge and ever-growing number of digital media works that needed to be indexed. Today, the DNB uses open source algorithms and frameworks to assign various types of thematic meta information in this way. This practice paper provides a deeper insight into automatic subject cataloguing at the DNB. We look at the data and vocabularies used as well as at the different methods and approaches. The vocabulary for classification is based on the Dewey Decimal Classification (DDC). For verbal subject indexing we use the German Integrated Authority File (GND). The use case of automatic classification is divided into the assignment of DDC Subject Categories and DDC Short Numbers. Due to the large size of the GND vocabulary, the use case of automatic indexing is an extreme multi-label classification (XMLC) problem. A brief report is given about the construction and the performance of our models. Based on these use cases, we present some implementation aspects of our “subject cataloguing machine” EMa, the environment for automatic subject cataloguing in productive use. We point out the basic feature set and provide a high-level introduction of the productive EMa system. The modular design of the EMa software architecture with the open source software Annif as a central toolkit is described. The development of EMa is an ongoing task at the DNB. It requires continuous development and maintenance, technological and human resources. Applied research activities in the DNB's AI project are closely related to the EMa ensuring that relevant scientific findings get integrated into its development. |
---|---|
ISSN: | 2213-056X 2213-056X |
DOI: | 10.53377/lq.19422 |