The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research
MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs...
Saved in:
Published in | Journal of the American Society for Information Science and Technology Vol. 74; no. 2; pp. 205 - 218 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Hoboken, USA
John Wiley & Sons, Inc
01.02.2023
Wiley Periodicals Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain‐adaptive pre‐training for the closely related task of reviewer assignment. |
---|---|
AbstractList | MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date. An important part of the MEDLINE indexing process is the assignment of articles to indexers. High quality and timely indexing is only possible when articles are assigned to indexers with suitable expertise. This article introduces the NLM indexer assignment dataset: a large dataset of 4.2 million indexer article assignments for articles indexed between 2011 and 2019. The dataset is shown to be a valuable testbed for expert matching and assignment algorithms, and indexer article assignment is also found to be useful domain‐adaptive pre‐training for the closely related task of reviewer assignment. |
Author | Demner‐Fushman, Dina Rae, Alastair R. Mork, James G. |
AuthorAffiliation | 1 Lister Hill National Center for Biomedical Communications National Library of Medicine Bethesda Maryland USA |
AuthorAffiliation_xml | – name: 1 Lister Hill National Center for Biomedical Communications National Library of Medicine Bethesda Maryland USA |
Author_xml | – sequence: 1 givenname: Alastair R. orcidid: 0000-0003-4675-0627 surname: Rae fullname: Rae, Alastair R. email: alastair.rae@nih.gov organization: National Library of Medicine – sequence: 2 givenname: James G. surname: Mork fullname: Mork, James G. organization: National Library of Medicine – sequence: 3 givenname: Dina surname: Demner‐Fushman fullname: Demner‐Fushman, Dina organization: National Library of Medicine |
BookMark | eNp1kUtOwzAQhi0EolBYcANLrFi0-JE4MQukquIlFVgAa8uJJ62r1AY7bemOI3BGTkKgBYkFqxlpvvnn8e-jbecdIHRESZ8Swk51tH2WZIxtoT3GOelRkfDt35ynHXQY45QQQonMU0Z3UYeLnEqRsD20epwAvtON9U7XeGSLoMMK-wrfgrGldYCtM_AKAesY7djNwDXY6EZHaM7wADtY4lqHMXy8vcdS1_BTxJUPOMDCwvJvc4AIOpSTA7RT6TrC4SZ20dPlxePwuje6v7oZDka9krdX9aQQiaDayNQQ4DLLc5mRNJU5ZxVAlQgwmWRFmphKAC9lVpmCGGOEFnkhDfAuOl_rPs-LGZiy3SHoWj0HO2tPVV5b9bfi7ESN_UJJyTMheCtwvBEI_mUOsVFTPw_tu6Jimch4LmhCW-pkTZXBxxig-p1AifoySrVGqW-jWvZ0zS5tDav_QTV4uFl3fAJaXZgk |
Cites_doi | 10.1145/1645953.1646207 10.1145/3458754 10.18653/v1/D19-1371 10.23919/ECC.2013.6669541 10.1145/3292500.3330899 10.18653/v1/2020.acl-main.207 10.18653/v1/N18-3011 10.1145/1809400.1809413 10.1108/eb026526 10.1162/jmlr.2003.3.4-5.993 10.1145/2979672 10.18653/v1/2020.acl-main.740 10.1126/science.abi8182 |
ContentType | Journal Article |
Copyright | Published 2022. This article is a U.S. Government work and is in the public domain in the USA. published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. Published 2022. This article is a U.S. Government work and is in the public domain in the USA.Journal of the Association for Information Science and Technologypublished by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: Published 2022. This article is a U.S. Government work and is in the public domain in the USA. published by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. – notice: Published 2022. This article is a U.S. Government work and is in the public domain in the USA.Journal of the Association for Information Science and Technologypublished by Wiley Periodicals LLC on behalf of Association for Information Science and Technology. This article is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 24P AAYXX CITATION 7SC 8FD E3H F2A JQ2 L7M L~C L~D 5PM |
DOI | 10.1002/asi.24722 |
DatabaseName | Wiley Online Library Open Access CrossRef Computer and Information Systems Abstracts Technology Research Database Library & Information Sciences Abstracts (LISA) Library & Information Science Abstracts (LISA) ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional PubMed Central (Full Participant titles) |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Library and Information Science Abstracts (LISA) ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database CrossRef |
Database_xml | – sequence: 1 dbid: 24P name: Wiley Online Library Open Access url: https://authorservices.wiley.com/open-science/open-access/browse-journals.html sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine Engineering Library & Information Science |
DocumentTitleAlternate | Rae et al |
EISSN | 2330-1643 |
EndPage | 218 |
ExternalDocumentID | PMC9937663 10_1002_asi_24722 ASI24722 |
Genre | researchArticle |
GrantInformation_xml | – fundername: National Library of Medicine Intramural Research Program |
GroupedDBID | .4I 0R~ 1OC 24P 33P 3SF 52U 5VS AAESR AAEVG AAHHS AAHQN AAMNL AANHP AANLZ AAONW AASGY AAWTL AAXRX AAYCA AAZKR ABCUV ABJNI ABLJU ACAHQ ACBWZ ACCFJ ACCZN ACFBH ACGFS ACHQT ACPOU ACRPL ACXBN ACXQS ACYXJ ADBBV ADEOM ADIZJ ADKYN ADMGS ADNMO ADOZA ADXAS ADZMN AEEZP AEIGN AEIMD AENEX AEQDE AEUQT AEUYR AFBPY AFFPM AFGKR AFKRA AFPWT AFWVQ AFZJQ AHBTC AHQJS AIMQZ AITYG AIURR AIWBW AJBDE AJXKR AKVCP ALAGY ALMA_UNASSIGNED_HOLDINGS ALUQN ALVPJ AMBMR AMYDB ATUGU AUFTA AZFZN AZVAB BDRZF BGLVJ BHBCM BMNLL BMXJE BNHUX BPHCQ BRXPI BY8 CCPQU D-F DCZOG DRFUL DRSTM EBO EBS EBU EIHBH EJD ELW F00 F01 F04 G-S G.N GODZA HGLYW I-F K60 K6~ K7- LATKE LEEKS LH4 LIQON LITHE LOXES LUTES LW6 LYRES MEWTI MK~ ML~ MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM NF~ O66 O9- P2W PQBIZ PQBZA PQEDU PQQKQ PROAC QB0 ROL SUPJJ TH9 WBKPD WIH WIK WOHZO WXSBR WYISQ WZISG AAYXX ADMLS AEYWJ AGHNM AGQPQ AGYGG CITATION PHGZM PHGZT PMKZF -~X .3N .DC .GA 05W 10A 1OB 3WU 4ZD 50Y 50Z 51W 51X 52M 52N 52O 52P 52S 52T 52W 52X 53G 5GY 66C 6PF 702 77K 7PT 7SC 7WY 8-0 8-1 8-3 8-4 8-5 8FD 8UM 8VB 930 A03 ABCQN ABEML ABIJN ABPPZ ACSCC ALSLI ARAPS AZBYB BAFTC BENPR BKOMP BROTX CS3 D-E DR2 DU5 E3H F2A F5P GUQSH H.T H.X HCIFZ HZ~ IX1 JQ2 K1G L7M LAW LC2 LC3 LP6 LP7 L~C L~D M0C M0F M2O MK4 N04 N05 OIG P2X P4D Q.N Q11 QRW QWB R.K RX1 UB1 V2E W8V WH7 WQJ XG1 XPP XV2 XZL ZL0 ~IA ~WT 5PM AAMMB AEFGJ AGXDD AIDQK AIDYY |
ID | FETCH-LOGICAL-c3472-966461ad95d0e39788970559832feef46ed792b54df6e3c97fdb0ddd6a68b9de3 |
IEDL.DBID | 24P |
ISSN | 2330-1635 |
IngestDate | Thu Aug 21 18:38:10 EDT 2025 Wed Aug 13 04:52:14 EDT 2025 Tue Jul 01 03:09:27 EDT 2025 Wed Jan 22 16:18:19 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 2 |
Language | English |
License | Attribution-NonCommercial This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c3472-966461ad95d0e39788970559832feef46ed792b54df6e3c97fdb0ddd6a68b9de3 |
Notes | Funding information National Library of Medicine Intramural Research Program ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 Funding information National Library of Medicine Intramural Research Program |
ORCID | 0000-0003-4675-0627 |
OpenAccessLink | https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fasi.24722 |
PMID | 36819642 |
PQID | 2767386141 |
PQPubID | 26268 |
PageCount | 14 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_9937663 proquest_journals_2767386141 crossref_primary_10_1002_asi_24722 wiley_primary_10_1002_asi_24722_ASI24722 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | February 2023 |
PublicationDateYYYYMMDD | 2023-02-01 |
PublicationDate_xml | – month: 02 year: 2023 text: February 2023 |
PublicationDecade | 2020 |
PublicationPlace | Hoboken, USA |
PublicationPlace_xml | – name: Hoboken, USA – name: Hoboken |
PublicationTitle | Journal of the American Society for Information Science and Technology |
PublicationYear | 2023 |
Publisher | John Wiley & Sons, Inc Wiley Periodicals Inc |
Publisher_xml | – name: John Wiley & Sons, Inc – name: Wiley Periodicals Inc |
References | 2010; 11 2017; 30 2021; 3 2017; 60 2021 2020 2009 2019 2003; 3 2008 2018 2014 2013 1972; 28 1999 e_1_2_10_12_1 Vaswani A. (e_1_2_10_19_1) 2017 e_1_2_10_9_1 e_1_2_10_13_1 e_1_2_10_10_1 e_1_2_10_11_1 Taylor C. J. (e_1_2_10_18_1) 2008 Le Q. (e_1_2_10_16_1) 2014 Yarowsky D. (e_1_2_10_20_1) 1999 e_1_2_10_2_1 Charlin L. (e_1_2_10_5_1) 2013 Devlin J. (e_1_2_10_7_1) 2019 e_1_2_10_4_1 e_1_2_10_3_1 e_1_2_10_6_1 e_1_2_10_17_1 e_1_2_10_8_1 e_1_2_10_14_1 e_1_2_10_15_1 |
References_xml | – volume: 30 year: 2017 – start-page: 3071 year: 2013 end-page: 3076 – volume: 11 start-page: 63 issue: 2 year: 2010 end-page: 67 article-title: Novel tools to streamline the conference review process: Experiences from sigkdd'09 publication-title: SIGKDD Explorations Newsletter – volume: 3 start-page: 1 issue: 1 year: 2021 end-page: 23 article-title: Domain‐specific language model pretraining for biomedical natural language processing publication-title: ACM Transactions on Computing for Healthcare – start-page: 8342 year: 2020 end-page: 8360 – volume: 28 start-page: 11 year: 1972 end-page: 21 article-title: A statistical interpretation of term specificity and its application in retrieval publication-title: Journal of Documentation – year: 2008 – start-page: 3615 year: 2019 end-page: 3620 – year: 2021 – start-page: 84 year: 2018 end-page: 91 – start-page: 1247 year: 2019 end-page: 1257 – start-page: 2270 year: 2020 end-page: 2282 – start-page: 1188 year: 2014 end-page: 1196 – start-page: 4171 year: 2019 end-page: 4186 – volume: 3 start-page: 993 year: 2003 end-page: 1022 article-title: Latent dirichlet allocation publication-title: Journal of Machine Learning Research – start-page: 1697 year: 2009 end-page: 1700 – year: 2013 – volume: 60 start-page: 70 issue: 3 year: 2017 end-page: 79 article-title: Computational support for academic peer review: A perspective from artificial intelligence publication-title: Communications of the ACM – year: 1999 – ident: e_1_2_10_14_1 doi: 10.1145/1645953.1646207 – ident: e_1_2_10_10_1 doi: 10.1145/3458754 – ident: e_1_2_10_3_1 doi: 10.18653/v1/D19-1371 – ident: e_1_2_10_8_1 doi: 10.23919/ECC.2013.6669541 – ident: e_1_2_10_15_1 doi: 10.1145/3292500.3330899 – volume-title: On the optimal assignment of conference papers to reviewers year: 2008 ident: e_1_2_10_18_1 – ident: e_1_2_10_6_1 doi: 10.18653/v1/2020.acl-main.207 – ident: e_1_2_10_2_1 doi: 10.18653/v1/N18-3011 – start-page: 4171 volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), MN, Minnesota year: 2019 ident: e_1_2_10_7_1 – ident: e_1_2_10_9_1 doi: 10.1145/1809400.1809413 – ident: e_1_2_10_13_1 doi: 10.1108/eb026526 – start-page: 1188 volume-title: Proceedings of the 31st International Conference on Machine Learning, Volume 32 of Proceedings of Machine Learning Research, Bejing, China year: 2014 ident: e_1_2_10_16_1 – ident: e_1_2_10_4_1 doi: 10.1162/jmlr.2003.3.4-5.993 – volume-title: ICML Workshop on Peer Reviewing and Publishing Models (PEER), Atlanta, GA year: 2013 ident: e_1_2_10_5_1 – ident: e_1_2_10_17_1 doi: 10.1145/2979672 – ident: e_1_2_10_11_1 doi: 10.18653/v1/2020.acl-main.740 – ident: e_1_2_10_12_1 doi: 10.1126/science.abi8182 – volume-title: 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, College Park, MD, USA year: 1999 ident: e_1_2_10_20_1 – volume-title: Advances in neural information processing systems year: 2017 ident: e_1_2_10_19_1 |
SSID | ssj0001098521 ssj0011510 |
Score | 2.383821 |
Snippet | MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal... |
SourceID | pubmedcentral proquest crossref wiley |
SourceType | Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 205 |
SubjectTerms | Algorithms Citation indexes Datasets Government Libraries Indexing Libraries Medical Subject Headings-MeSH Medicine National libraries |
Title | The National Library of Medicine indexer assignment dataset: A new large‐scale dataset for reviewer assignment research |
URI | https://onlinelibrary.wiley.com/doi/abs/10.1002%2Fasi.24722 https://www.proquest.com/docview/2767386141 https://pubmed.ncbi.nlm.nih.gov/PMC9937663 |
Volume | 74 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1BT4MwFMdf5nbxYjRqROfSGA9ecFBooXpanMs0zhjnkt0IpcUtWTYz5t2P4Gf0k9gW2MTExBsECuGVR3-07_0fwHkoEhoSIW0ZM8_2BVE-R5LETrhL9QAlQqrznQePtD_y78dkXIPrMhcm14dYT7hpzzDfa-3gMc_aG9HQOJteYi11uAUNnVqr4_mw_7SZYHFYSEzeFVb_7LbiDlIqCzm4vW5dHY82kPk7RPInupqxp7cLOwU0ok7ey3tQk_N90CXnUCFrPUNF_gFapGhQLJcjo4Qol0jx8fTVrPojHRCaydUV6iDF02im48C_Pj4z1VOyPIgUx6I8paXauBAGmhzAqHf7ctO3i0IKduKpZ9QKnD51Y8GIcKQCkDBkWkSHKW9OpUx9KkXAMCe-SKn0EhakgjtCCBrTkDMhvUOozxdzeQRIIwnmqSdd6fmOx3mCWcAUBbqCMpcEFpyV5ozecr2MKFdGxpGyeWRsbkGzNHRUuEwW4cAUIHV914KgYvz1hbQUdvXIfDoxktiashQ7WXBhuunvW0ed4Z3ZOP7_qSewrYvM57HaTaivlu_yVKHIirfMK9eCRqc7eBiqve4z_ga-lN_J |
linkProvider | Wiley-Blackwell |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8QwEB58HPQiiorrM4iCl2qbVxvBw-KDXXVFUMFb3TZTXZBV7Ip48yf4Q_xV_hKTtN11BcGLt0LStExmOl8nM98AbEQ6lZHQ6GFbMY9rYWxOpKmXJoG0DkpH0tY7t85k44ofX4vrEfioamEKfoh-wM1ahvteWwO3AemdAWtoO-9sU8t1WKZUnuDri_lhy_eaB2Z3Nyk9Orzcb3hlTwEvZWauJaPkMmhrJbSPxhdHkbJ8MsoodoaYcYk6VDQRXGcSWarCTCe-1lq2ZZQojcysOwrjXNLQ9kug_HwQ0fFVJFyhF2XM9wzQERWVkU93-m877AAHqPZnTuZ3rOyc3dE0TJUoldQLtZqBEezOgu1xR0oe7XtSFjyQh4y0yvN54qgX8YkYQN65dWkGxGag5tjbJXViADy5t4nnn2_vuVENrAaJAc6kqKEZvrlkIrqbg6t_kfI8jHUfurgAxGIgmmQMA2TcZ0mSUhUqAzsDLVUgwhqsV-KMHwuCjrigYqaxkXnsZF6D5UrQcWmjeUxD1_E04EENwiHh9xey3NvDI93OnePgtrDOgLUabLlt-v3Rcf2i6S4W_z51DSYal63T-LR5drIEk7bDfZEovgxjvadnXDE4qJesOvUjcPPf-v4FwIYa9w |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dSsMwFD7oBPFGFBXnbxAFb6pt0qSN4MVwDuecCCp4V9fm1A1kip2Idz6C7-Fb-SQmabs5QfDGu0LStJyc03xNvvMdgO1QJSLkCh3sSOb4iuuY40niJLEnzAKlQmHyndvn4uTaP73hNxPwUebC5PoQww03Exn2e20C_FGl-yPR0E7W26NG6rBgVLbw9UX_r2WHzbqe3B1KG8dXRydOUVLASZjua7QofeF1lOTKRb0Uh6E0cjJS-3WKmPoCVSBpzH2VCmSJDFIVu0op0RFhLBUyPe4kTJnDRcMfo_7FaEPHlSG3eV6UMdfROIeXSkYu3R--7fj6NwK1PymZ36GyXesaczBbgFRSy71qHiawvwCmxB0pZLTvSZHvQB5S0i6O54lVXsQnovF4786yDIghoGY4OCA1ovE7uTe888-390x7BpaNRONmkqfQjN9cCBF1F-H6X6y8BJX-Qx-XgRgIROOUoYfMd1kcJ1QGUqNOTwnp8aAKW6U5o8dcnyPKlZhppG0eWZtXYa00dFSEaBbRwBY89XyvCsGY8YcDGent8ZZ-r2sluA2q01itCrt2mn5_dFS7bNqLlb933YTpi3ojOmuet1ZhxtS3z2nia1AZPD3jukZBg3jDeh-B2_929y_rtxop |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+National+Library+of+Medicine+indexer+assignment+dataset%3A+A+new+large%E2%80%90scale+dataset+for+reviewer+assignment+research&rft.jtitle=Journal+of+the+Association+for+Information+Science+and+Technology&rft.au=Rae%2C+Alastair+R.&rft.au=Mork%2C+James+G.&rft.au=Demner%E2%80%90Fushman%2C+Dina&rft.date=2023-02-01&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.issn=2330-1635&rft.eissn=2330-1643&rft.volume=74&rft.issue=2&rft.spage=205&rft.epage=218&rft_id=info:doi/10.1002%2Fasi.24722&rft.externalDBID=10.1002%252Fasi.24722&rft.externalDocID=ASI24722 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2330-1635&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2330-1635&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2330-1635&client=summon |