The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research

MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Society for Information Science and Technology Vol. 74; no. 2; pp. 205 - 218
Main Authors Rae, Alastair R., Mork, James G., Demner‐Fushman, Dina
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.02.2023
Wiley Periodicals Inc
Subjects
Online AccessGet full text

Cover

Loading…
Abstract MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain‐adaptive pre‐training for the closely related task of reviewer assignment.
AbstractList MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain‐adaptive pre‐training for the closely related task of reviewer assignment.
Author Demner‐Fushman, Dina
Rae, Alastair R.
Mork, James G.
AuthorAffiliation 1 Lister Hill National Center for Biomedical Communications National Library of Medicine Bethesda Maryland USA
AuthorAffiliation_xml – name: 1 Lister Hill National Center for Biomedical Communications National Library of Medicine Bethesda Maryland USA
Author_xml – sequence: 1
  givenname: Alastair R.
  orcidid: 0000-0003-4675-0627
  surname: Rae
  fullname: Rae, Alastair R.
  email: alastair.rae@nih.gov
  organization: National Library of Medicine
– sequence: 2
  givenname: James G.
  surname: Mork
  fullname: Mork, James G.
  organization: National Library of Medicine
– sequence: 3
  givenname: Dina
  surname: Demner‐Fushman
  fullname: Demner‐Fushman, Dina
  organization: National Library of Medicine
BookMark eNp1kUtOwzAQhi0EolBYcANLrFi0-JE4MQukquIlFVgAa8uJJ62r1AY7bemOI3BGTkKgBYkFqxlpvvnn8e-jbecdIHRESZ8Swk51tH2WZIxtoT3GOelRkfDt35ynHXQY45QQQonMU0Z3UYeLnEqRsD20epwAvtON9U7XeGSLoMMK-wrfgrGldYCtM_AKAesY7djNwDXY6EZHaM7wADtY4lqHMXy8vcdS1_BTxJUPOMDCwvJvc4AIOpSTA7RT6TrC4SZ20dPlxePwuje6v7oZDka9krdX9aQQiaDayNQQ4DLLc5mRNJU5ZxVAlQgwmWRFmphKAC9lVpmCGGOEFnkhDfAuOl_rPs-LGZiy3SHoWj0HO2tPVV5b9bfi7ESN_UJJyTMheCtwvBEI_mUOsVFTPw_tu6Jimch4LmhCW-pkTZXBxxig-p1AifoySrVGqW-jWvZ0zS5tDav_QTV4uFl3fAJaXZgk
Cites_doi 10.1145/1645953.1646207
10.1145/3458754
10.18653/v1/D19-1371
10.23919/ECC.2013.6669541
10.1145/3292500.3330899
10.18653/v1/2020.acl-main.207
10.18653/v1/N18-3011
10.1145/1809400.1809413
10.1108/eb026526
10.1162/jmlr.2003.3.4-5.993
10.1145/2979672
10.18653/v1/2020.acl-main.740
10.1126/science.abi8182
ContentType Journal Article
Copyright Published 2022. This article is a U.S. Government work and is in the public domain in the USA. published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology.
Published 2022. This article is a U.S. Government work and is in the public domain in the USA.Journal of the Association for Information Science and Technologypublished by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: Published 2022. This article is a U.S. Government work and is in the public domain in the USA. published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology.
– notice: Published 2022. This article is a U.S. Government work and is in the public domain in the USA.Journal of the Association for Information Science and Technologypublished by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 24P
AAYXX
CITATION
7SC
8FD
E3H
F2A
JQ2
L7M
L~C
L~D
5PM
DOI 10.1002/asi.24722
DatabaseName Wiley Online Library Open Access
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Library & Information Sciences Abstracts (LISA)
Library & Information Science Abstracts (LISA)
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Library and Information Science Abstracts (LISA)
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database


CrossRef
Database_xml – sequence: 1
  dbid: 24P
  name: Wiley Online Library Open Access
  url: https://authorservices.wiley.com/open-science/open-access/browse-journals.html
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Engineering
Library & Information Science
DocumentTitleAlternate Rae et al
EISSN 2330-1643
EndPage 218
ExternalDocumentID PMC9937663
10_1002_asi_24722
ASI24722
Genre researchArticle
GrantInformation_xml – fundername: National Library of Medicine Intramural Research Program
GroupedDBID .4I
0R~
1OC
24P
33P
3SF
52U
5VS
AAESR
AAEVG
AAHHS
AAHQN
AAMNL
AANHP
AANLZ
AAONW
AASGY
AAWTL
AAXRX
AAYCA
AAZKR
ABCUV
ABJNI
ABLJU
ACAHQ
ACBWZ
ACCFJ
ACCZN
ACFBH
ACGFS
ACHQT
ACPOU
ACRPL
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADNMO
ADOZA
ADXAS
ADZMN
AEEZP
AEIGN
AEIMD
AENEX
AEQDE
AEUQT
AEUYR
AFBPY
AFFPM
AFGKR
AFKRA
AFPWT
AFWVQ
AFZJQ
AHBTC
AHQJS
AIMQZ
AITYG
AIURR
AIWBW
AJBDE
AJXKR
AKVCP
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMYDB
ATUGU
AUFTA
AZFZN
AZVAB
BDRZF
BGLVJ
BHBCM
BMNLL
BMXJE
BNHUX
BPHCQ
BRXPI
BY8
CCPQU
D-F
DCZOG
DRFUL
DRSTM
EBO
EBS
EBU
EIHBH
EJD
ELW
F00
F01
F04
G-S
G.N
GODZA
HGLYW
I-F
K60
K6~
K7-
LATKE
LEEKS
LH4
LIQON
LITHE
LOXES
LUTES
LW6
LYRES
MEWTI
MK~
ML~
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
NF~
O66
O9-
P2W
PQBIZ
PQBZA
PQEDU
PQQKQ
PROAC
QB0
ROL
SUPJJ
TH9
WBKPD
WIH
WIK
WOHZO
WXSBR
WYISQ
WZISG
AAYXX
ADMLS
AEYWJ
AGHNM
AGQPQ
AGYGG
CITATION
PHGZM
PHGZT
PMKZF
-~X
.3N
.DC
.GA
05W
10A
1OB
3WU
4ZD
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52W
52X
53G
5GY
66C
6PF
702
77K
7PT
7SC
7WY
8-0
8-1
8-3
8-4
8-5
8FD
8UM
8VB
930
A03
ABCQN
ABEML
ABIJN
ABPPZ
ACSCC
ALSLI
ARAPS
AZBYB
BAFTC
BENPR
BKOMP
BROTX
CS3
D-E
DR2
DU5
E3H
F2A
F5P
GUQSH
H.T
H.X
HCIFZ
HZ~
IX1
JQ2
K1G
L7M
LAW
LC2
LC3
LP6
LP7
L~C
L~D
M0C
M0F
M2O
MK4
N04
N05
OIG
P2X
P4D
Q.N
Q11
QRW
QWB
R.K
RX1
UB1
V2E
W8V
WH7
WQJ
XG1
XPP
XV2
XZL
ZL0
~IA
~WT
5PM
AAMMB
AEFGJ
AGXDD
AIDQK
AIDYY
ID FETCH-LOGICAL-c3472-966461ad95d0e39788970559832feef46ed792b54df6e3c97fdb0ddd6a68b9de3
IEDL.DBID 24P
ISSN 2330-1635
IngestDate Thu Aug 21 18:38:10 EDT 2025
Wed Aug 13 04:52:14 EDT 2025
Tue Jul 01 03:09:27 EDT 2025
Wed Jan 22 16:18:19 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License Attribution-NonCommercial
This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c3472-966461ad95d0e39788970559832feef46ed792b54df6e3c97fdb0ddd6a68b9de3
Notes Funding information
National Library of Medicine Intramural Research Program
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
Funding information National Library of Medicine Intramural Research Program
ORCID 0000-0003-4675-0627
OpenAccessLink https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fasi.24722
PMID 36819642
PQID 2767386141
PQPubID 26268
PageCount 14
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_9937663
proquest_journals_2767386141
crossref_primary_10_1002_asi_24722
wiley_primary_10_1002_asi_24722_ASI24722
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate February 2023
PublicationDateYYYYMMDD 2023-02-01
PublicationDate_xml – month: 02
  year: 2023
  text: February 2023
PublicationDecade 2020
PublicationPlace Hoboken, USA
PublicationPlace_xml – name: Hoboken, USA
– name: Hoboken
PublicationTitle Journal of the American Society for Information Science and Technology
PublicationYear 2023
Publisher John Wiley & Sons, Inc
Wiley Periodicals Inc
Publisher_xml – name: John Wiley & Sons, Inc
– name: Wiley Periodicals Inc
References 2010; 11
2017; 30
2021; 3
2017; 60
2021
2020
2009
2019
2003; 3
2008
2018
2014
2013
1972; 28
1999
e_1_2_10_12_1
Vaswani A. (e_1_2_10_19_1) 2017
e_1_2_10_9_1
e_1_2_10_13_1
e_1_2_10_10_1
e_1_2_10_11_1
Taylor C. J. (e_1_2_10_18_1) 2008
Le Q. (e_1_2_10_16_1) 2014
Yarowsky D. (e_1_2_10_20_1) 1999
e_1_2_10_2_1
Charlin L. (e_1_2_10_5_1) 2013
Devlin J. (e_1_2_10_7_1) 2019
e_1_2_10_4_1
e_1_2_10_3_1
e_1_2_10_6_1
e_1_2_10_17_1
e_1_2_10_8_1
e_1_2_10_14_1
e_1_2_10_15_1
References_xml – volume: 30
  year: 2017
– start-page: 3071
  year: 2013
  end-page: 3076
– volume: 11
  start-page: 63
  issue: 2
  year: 2010
  end-page: 67
  article-title: Novel tools to streamline the conference review process: Experiences from sigkdd'09
  publication-title: SIGKDD Explorations Newsletter
– volume: 3
  start-page: 1
  issue: 1
  year: 2021
  end-page: 23
  article-title: Domain‐specific language model pretraining for biomedical natural language processing
  publication-title: ACM Transactions on Computing for Healthcare
– start-page: 8342
  year: 2020
  end-page: 8360
– volume: 28
  start-page: 11
  year: 1972
  end-page: 21
  article-title: A statistical interpretation of term specificity and its application in retrieval
  publication-title: Journal of Documentation
– year: 2008
– start-page: 3615
  year: 2019
  end-page: 3620
– year: 2021
– start-page: 84
  year: 2018
  end-page: 91
– start-page: 1247
  year: 2019
  end-page: 1257
– start-page: 2270
  year: 2020
  end-page: 2282
– start-page: 1188
  year: 2014
  end-page: 1196
– start-page: 4171
  year: 2019
  end-page: 4186
– volume: 3
  start-page: 993
  year: 2003
  end-page: 1022
  article-title: Latent dirichlet allocation
  publication-title: Journal of Machine Learning Research
– start-page: 1697
  year: 2009
  end-page: 1700
– year: 2013
– volume: 60
  start-page: 70
  issue: 3
  year: 2017
  end-page: 79
  article-title: Computational support for academic peer review: A perspective from artificial intelligence
  publication-title: Communications of the ACM
– year: 1999
– ident: e_1_2_10_14_1
  doi: 10.1145/1645953.1646207
– ident: e_1_2_10_10_1
  doi: 10.1145/3458754
– ident: e_1_2_10_3_1
  doi: 10.18653/v1/D19-1371
– ident: e_1_2_10_8_1
  doi: 10.23919/ECC.2013.6669541
– ident: e_1_2_10_15_1
  doi: 10.1145/3292500.3330899
– volume-title: On the optimal assignment of conference papers to reviewers
  year: 2008
  ident: e_1_2_10_18_1
– ident: e_1_2_10_6_1
  doi: 10.18653/v1/2020.acl-main.207
– ident: e_1_2_10_2_1
  doi: 10.18653/v1/N18-3011
– start-page: 4171
  volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), MN, Minnesota
  year: 2019
  ident: e_1_2_10_7_1
– ident: e_1_2_10_9_1
  doi: 10.1145/1809400.1809413
– ident: e_1_2_10_13_1
  doi: 10.1108/eb026526
– start-page: 1188
  volume-title: Proceedings of the 31st International Conference on Machine Learning, Volume 32 of Proceedings of Machine Learning Research, Bejing, China
  year: 2014
  ident: e_1_2_10_16_1
– ident: e_1_2_10_4_1
  doi: 10.1162/jmlr.2003.3.4-5.993
– volume-title: ICML Workshop on Peer Reviewing and Publishing Models (PEER), Atlanta, GA
  year: 2013
  ident: e_1_2_10_5_1
– ident: e_1_2_10_17_1
  doi: 10.1145/2979672
– ident: e_1_2_10_11_1
  doi: 10.18653/v1/2020.acl-main.740
– ident: e_1_2_10_12_1
  doi: 10.1126/science.abi8182
– volume-title: 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, USA
  year: 1999
  ident: e_1_2_10_20_1
– volume-title: Advances in neural information processing systems
  year: 2017
  ident: e_1_2_10_19_1
SSID ssj0001098521
ssj0011510
Score 2.383821
Snippet MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal...
SourceID pubmedcentral
proquest
crossref
wiley
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 205
SubjectTerms Algorithms
Citation indexes
Datasets
Government Libraries
Indexing
Libraries
Medical Subject Headings-MeSH
Medicine
National libraries
Title The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
URI https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fasi.24722
https://www.proquest.com/docview/2767386141
https://pubmed.ncbi.nlm.nih.gov/PMC9937663
Volume 74
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1BT4MwFMdf5nbxYjRqROfSGA9ecFBooXpanMs0zhjnkt0IpcUtWTYz5t2P4Gf0k9gW2MTExBsECuGVR3-07_0fwHkoEhoSIW0ZM8_2BVE-R5LETrhL9QAlQqrznQePtD_y78dkXIPrMhcm14dYT7hpzzDfa-3gMc_aG9HQOJteYi11uAUNnVqr4_mw_7SZYHFYSEzeFVb_7LbiDlIqCzm4vW5dHY82kPk7RPInupqxp7cLOwU0ok7ey3tQk_N90CXnUCFrPUNF_gFapGhQLJcjo4Qol0jx8fTVrPojHRCaydUV6iDF02im48C_Pj4z1VOyPIgUx6I8paXauBAGmhzAqHf7ctO3i0IKduKpZ9QKnD51Y8GIcKQCkDBkWkSHKW9OpUx9KkXAMCe-SKn0EhakgjtCCBrTkDMhvUOozxdzeQRIIwnmqSdd6fmOx3mCWcAUBbqCMpcEFpyV5ozecr2MKFdGxpGyeWRsbkGzNHRUuEwW4cAUIHV914KgYvz1hbQUdvXIfDoxktiashQ7WXBhuunvW0ed4Z3ZOP7_qSewrYvM57HaTaivlu_yVKHIirfMK9eCRqc7eBiqve4z_ga-lN_J
linkProvider Wiley-Blackwell
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8QwEB58HPQiiorrM4iCl2qbVxvBw-KDXXVFUMFb3TZTXZBV7Ip48yf4Q_xV_hKTtN11BcGLt0LStExmOl8nM98AbEQ6lZHQ6GFbMY9rYWxOpKmXJoG0DkpH0tY7t85k44ofX4vrEfioamEKfoh-wM1ahvteWwO3AemdAWtoO-9sU8t1WKZUnuDri_lhy_eaB2Z3Nyk9Orzcb3hlTwEvZWauJaPkMmhrJbSPxhdHkbJ8MsoodoaYcYk6VDQRXGcSWarCTCe-1lq2ZZQojcysOwrjXNLQ9kug_HwQ0fFVJFyhF2XM9wzQERWVkU93-m877AAHqPZnTuZ3rOyc3dE0TJUoldQLtZqBEezOgu1xR0oe7XtSFjyQh4y0yvN54qgX8YkYQN65dWkGxGag5tjbJXViADy5t4nnn2_vuVENrAaJAc6kqKEZvrlkIrqbg6t_kfI8jHUfurgAxGIgmmQMA2TcZ0mSUhUqAzsDLVUgwhqsV-KMHwuCjrigYqaxkXnsZF6D5UrQcWmjeUxD1_E04EENwiHh9xey3NvDI93OnePgtrDOgLUabLlt-v3Rcf2i6S4W_z51DSYal63T-LR5drIEk7bDfZEovgxjvadnXDE4qJesOvUjcPPf-v4FwIYa9w
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dSsMwFD7oBPFGFBXnbxAFb6pt0qSN4MVwDuecCCp4V9fm1A1kip2Idz6C7-Fb-SQmabs5QfDGu0LStJyc03xNvvMdgO1QJSLkCh3sSOb4iuuY40niJLEnzAKlQmHyndvn4uTaP73hNxPwUebC5PoQww03Exn2e20C_FGl-yPR0E7W26NG6rBgVLbw9UX_r2WHzbqe3B1KG8dXRydOUVLASZjua7QofeF1lOTKRb0Uh6E0cjJS-3WKmPoCVSBpzH2VCmSJDFIVu0op0RFhLBUyPe4kTJnDRcMfo_7FaEPHlSG3eV6UMdfROIeXSkYu3R--7fj6NwK1PymZ36GyXesaczBbgFRSy71qHiawvwCmxB0pZLTvSZHvQB5S0i6O54lVXsQnovF4786yDIghoGY4OCA1ovE7uTe888-390x7BpaNRONmkqfQjN9cCBF1F-H6X6y8BJX-Qx-XgRgIROOUoYfMd1kcJ1QGUqNOTwnp8aAKW6U5o8dcnyPKlZhppG0eWZtXYa00dFSEaBbRwBY89XyvCsGY8YcDGent8ZZ-r2sluA2q01itCrt2mn5_dFS7bNqLlb933YTpi3ojOmuet1ZhxtS3z2nia1AZPD3jukZBg3jDeh-B2_929y_rtxop
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+National+Library+of+Medicine+indexer+assignment+dataset%3A+A+new+large%E2%80%90scale+dataset+for+reviewer+assignment+research&rft.jtitle=Journal+of+the+Association+for+Information+Science+and+Technology&rft.au=Rae%2C+Alastair+R.&rft.au=Mork%2C+James+G.&rft.au=Demner%E2%80%90Fushman%2C+Dina&rft.date=2023-02-01&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.issn=2330-1635&rft.eissn=2330-1643&rft.volume=74&rft.issue=2&rft.spage=205&rft.epage=218&rft_id=info:doi/10.1002%2Fasi.24722&rft.externalDBID=10.1002%252Fasi.24722&rft.externalDocID=ASI24722
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2330-1635&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2330-1635&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2330-1635&client=summon