The NLM indexer assignment dataset: a new large-scale dataset for reviewer assignment research

MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs...

Full description

Saved in:
Bibliographic Details
Published inJournal of the Association for Information Science and Technology Vol. 74; no. 2; p. 205
Main Authors Rae, Alastair R, Mork, James G, Demner-Fushman, Dina
Format Journal Article
LanguageEnglish
Published United States 01.02.2023
Online AccessGet full text

Cover

Loading…
More Information
Summary:MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This paper introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain-adaptive pre-training for the closely related task of reviewer assignment.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2330-1635
DOI:10.1002/asi.24722