Annif and Finto AI : Developing and Implementing Automated Subject Indexing

Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital coll...

Full description

Saved in:
Bibliographic Details
Published inJLIS.it : Italian journal of library and information science Vol. 13; no. 1; pp. 265 - 282
Main Authors Suominen, Osma, Lehtinen, Mona, Inkinen, Juho
Format Journal Article
LanguageEnglish
Published Macerata EUM-Edizioni Università di Macerata 01.01.2022
University of Florence
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections. The National Library of Finland has created Annif, an open source toolkit for automated subject indexing and classification. Annif is multilingual, independent of the indexing vocabulary, and modular. It integrates many text classification algorithms, including Maui, fastText, Omikuji, and a neural network model based on TensorFlow. Best results can often be obtained by combining several algorithms. Many document corpora have been used for training and evaluating Annif. Finding the algorithms and configurations that give the best quality is an ongoing effort.In May 2020, we launched Finto AI, a service for automated subject indexing based on Annif. It provides a simple Web form for obtaining subject suggestions for text. The functionality is also available as a REST API. Many document repositories and the cataloguing system for electronic publications at the National Library of Finland are using it to integrate semi-automated subject indexing into their metadata workflows. In the future, we are going to extend Annif with more algorithms and new functionality, and to integrate Finto AI with other metadata management workflows. [Publisher's text].
AbstractList Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections. The National Library of Finland has created Annif, an open source toolkit for automated subject indexing and classification. Annif is multilingual, independent of the indexing vocabulary, and modular. It integrates many text classification algorithms, including Maui, fastText, Omikuji, and a neural network model based on TensorFlow. Best results can often be obtained by combining several algorithms. Many document corpora have been used for training and evaluating Annif. Finding the algorithms and configurations that give the best quality is an ongoing effort.In May 2020, we launched Finto AI, a service for automated subject indexing based on Annif. It provides a simple Web form for obtaining subject suggestions for text. The functionality is also available as a REST API. Many document repositories and the cataloguing system for electronic publications at the National Library of Finland are using it to integrate semi-automated subject indexing into their metadata workflows. In the future, we are going to extend Annif with more algorithms and new functionality, and to integrate Finto AI with other metadata management workflows. [Publisher's text].
Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections.
Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text classification must be trained and tested with examples of indexed documents, which can be obtained from existing bibliographic databases and digital collections. The National Library of Finland has created Annif, an open source toolkit for automated subject indexing and classification. Annif is multilingual, independent of the indexing vocabulary, and modular. It integrates many text classification algorithms, including Maui, fastText, Omikuji, and a neural network model based on TensorFlow. Best results can often be obtained by combining several algorithms. Many document corpora have been used for training and evaluating Annif. Finding the algorithms and configurations that give the best quality is an ongoing effort. In May 2020, we launched Finto AI, a service for automated subject indexing based on Annif. It provides a simple Web form for obtaining subject suggestions for text. The functionality is also available as a REST API. Many document repositories and the cataloguing system for electronic publications at the National Library of Finland are using it to integrate semi-automated subject indexing into their metadata workflows. In the future, we are going to extend Annif with more algorithms and new functionality, and to integrate Finto AI with other metadata management workflows. KEYWORDS Automated subject indexing; Artificial intelligence; Machine learning; Metadata.
Audience Academic
Author Suominen, Osma
Lehtinen, Mona
Inkinen, Juho
Author_xml – sequence: 1
  fullname: Suominen, Osma
– sequence: 2
  fullname: Lehtinen, Mona
– sequence: 3
  fullname: Inkinen, Juho
BookMark eNptUEtLAzEQDlLBWnv0HvC8Nc99eFuq1cWCB_W8ZJNJSclmS5OKP9-t9aDgzGGG78Uwl2gShgAIXVOyEILw2613ceFSRlkhyBmaMsLLjBKWT37tF2ge45aMJSohKz5Fz3UIzmIVDF65kAZcN_gO38MH-GHnwuabafqdhx5COgL1IQ29SmDw66Hbgk64CQY-R-oKnVvlI8x_5gy9rx7elk_Z-uWxWdbrTDNRpMxYBgx0WeRUCiYhh850suPGUm6rMqeMaMNAGlvRiuSF1pYJYqwRqhIdyfkM3ZxyN8pD64Id0l7p3kXd1gWhrCxLKUbV4h_V2AZ6p8fnWTfifwz4ZNAqKu-Ca9U-Oe0htpLS4638C0-sa8s
ContentType Journal Article
Copyright (c) Casalini Libri, 50014 Fiesole (Italy) - www.casalini.it
COPYRIGHT 2022 University of Florence
Copyright_xml – notice: (c) Casalini Libri, 50014 Fiesole (Italy) - www.casalini.it
– notice: COPYRIGHT 2022 University of Florence
DOI 10.4403/jlis.it-12740
DatabaseTitleList


DeliveryMethod fulltext_linktorsrc
Discipline Library & Information Science
EISSN 2038-1026
EndPage 282
ExternalDocumentID A701288854
5111542
GeographicLocations Finland
GeographicLocations_xml – name: Finland
GroupedDBID .4I
3V.
5VS
8FE
8FG
ABUWG
ADMPE
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ALSLI
ARAPS
BENPR
BFMQW
BGLVJ
BPHCQ
CCPQU
CNYFK
DWQXO
GROUPED_DOAJ
H13
HCIFZ
IAO
ICD
IPNFZ
ITC
K6V
K7-
KQ8
KWQ
M1O
M~E
OK1
P62
PIMPY
PQQKQ
PROAC
QF3
QF4
QN7
RIG
XH6
ID FETCH-LOGICAL-c247t-df2e2ec87615425e6ebdb5b3df13f986120cd2e5df919067ccf240dfd4a94b063
ISSN 2038-1026
IngestDate Wed Mar 19 00:31:33 EDT 2025
Sat Mar 08 18:45:34 EST 2025
Wed Dec 18 01:36:17 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c247t-df2e2ec87615425e6ebdb5b3df13f986120cd2e5df919067ccf240dfd4a94b063
PageCount 18
ParticipantIDs gale_infotracmisc_A701288854
gale_infotracacademiconefile_A701288854
casalini_articles_5111542
PublicationCentury 2000
PublicationDate 2022-01-01
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Macerata
PublicationPlace_xml – name: Macerata
PublicationTitle JLIS.it : Italian journal of library and information science
PublicationYear 2022
Publisher EUM-Edizioni Università di Macerata
University of Florence
Publisher_xml – sequence: 0
  name: EUM-Edizioni Università di Macerata
– name: University of Florence
SSID ssj0000494593
Score 2.295331
Snippet Manually indexing documents for subject-based access is a labour-intensive process that can be automated using AI technology. Algorithms for text...
SourceID gale
casalini
SourceType Aggregation Database
Publisher
StartPage 265
SubjectTerms Algorithms
Analysis
Artificial intelligence
Automated subject indexing
Controlled vocabularies
Indexing
Machine learning
Neural networks
Online databases
Technology application
Title Annif and Finto AI : Developing and Implementing Automated Subject Indexing
URI http://digital.casalini.it/10.4403/jlis.it-12740
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagXLggnmKhRT4gOKCUxHayCbcIWDUVjwOt1NvKr6hb0Y209UqIX8-M48TZ0kPhkl05K3uV-fT5m8nMmJDXqi2ZVSlPbJWVicgyCzwoVKJ4LsuiEEa1WJz89VtxdCqOz_KzGFXy1SVOHerfN9aV_I9VYQzsilWy_2DZcVIYgO9gX7iCheF6KxvXmJniw_-L1RpEZN2gg_8p1kH5ZN_LkCLuAyBb14FGRZW5VRiCAYIw9tewfw0q9Uvz43DlfLigcX0kZNJiIkR-QuOmsfzxXdhN43um7hI0rKe171eR_xs8rWGoCTnvYkrQuRvGgWnkNB7B2LV4xG4-yeJntxlX9qzGUo79ZFnof33D2EDL_C_4BY7tD5e4zv1CpNiD4gLbRq5ckoG7ncZNbkw9rOe4JZdlLu6Sewx8i6kfftG7TKJv1jz-rb43Ky7wfmd61DLySmIJa9jRJ9rk5CF5EJwKWvcIeUTu2PVjchBKUugb2kQj0UDmT8ixRw8FI1KPHlo3H2jEjr8xxQ4dsUMDduiAnafkdPH55ONREk7WSDQTc5eYlllmNeyEoKBZbgurjMoVN23G26oE1Ztqw2xu2goEYzHXugXlZ1ojZCUUqNpnZG_dre1zQuepzqWxjGtg9rJiKjPwWcgUlazlfEZmwzNaDrmeS1D5uPCMvMWHtkQzuo3UMpSEwMzYlWwZbTUj-zu_BA7Uk9svbj3RS3I_gnaf7LnN1h6ArHTqlYfBH3udegA
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Annif+and+Finto+AI%3A+Developing+and+Implementing+Automated+Subject+Indexing&rft.jtitle=JLIS.it+%3A+Italian+journal+of+library+and+information+science&rft.au=Suominen%2C+Osma&rft.au=Inkinen%2C+Juho&rft.au=Lehtinen%2C+Mona&rft.date=2022-01-01&rft.pub=University+of+Florence&rft.issn=2038-1026&rft.eissn=2038-1026&rft.volume=13&rft.issue=1&rft.spage=265&rft_id=info:doi/10.4403%2Fjlis.it-12740&rft.externalDocID=A701288854
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2038-1026&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2038-1026&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2038-1026&client=summon