How to Measure the Consistency of the Tagging of Scientific Papers?

A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics o...

Full description

Saved in:
Bibliographic Details
Published in2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) pp. 372 - 373
Main Author Veytsman, Boris
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2019
Subjects
Online AccessGet full text
DOI10.1109/JCDL.2019.00076

Cover

Abstract A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics of tagging consistency for scientific papers: whether these tags are predictive of citations. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging should be able to predict citations. We present an algorithm to calculate consistency, and show experiments with human-and machine-generated tags. We show that the addition of machine-generated tags to the manual ones can enhance tagging consistency. We further introduce cross-consistency metrics, the ability to predict citation links between papers tagged by different taggers, e.g. humans and computers. Cross-consistency metrics can be used to evaluate tagging quality of a tagger when the amount of labeled data by the known good tagger is limited.
AbstractList A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics of tagging consistency for scientific papers: whether these tags are predictive of citations. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging should be able to predict citations. We present an algorithm to calculate consistency, and show experiments with human-and machine-generated tags. We show that the addition of machine-generated tags to the manual ones can enhance tagging consistency. We further introduce cross-consistency metrics, the ability to predict citation links between papers tagged by different taggers, e.g. humans and computers. Cross-consistency metrics can be used to evaluate tagging quality of a tagger when the amount of labeled data by the known good tagger is limited.
Author Veytsman, Boris
Author_xml – sequence: 1
  givenname: Boris
  surname: Veytsman
  fullname: Veytsman, Boris
  organization: Chan Zuckerberg Initiative
BookMark eNotjMtKAzEUQCMoaGvXLtzkBzrePJo7WYmMjyojCtZ1STI3Y0AzZTIi_XvxsTqcszgzdpiHTIydCaiEAHvx0Fy3lQRhKwBAc8BmAmUtxEojHrNFKcmDBtRKrswJa9bDF58G_kiufI7EpzfizZBLKhPlsOdD_E0b1_cp9z_6EhLlKcUU-LPb0VguT9lRdO-FFv-cs9fbm02zXrZPd_fNVbt0UuO0dCFKULrrEEBFZYIno4SVSikdwCpEjFFF14E2vjbSe-cg2IDeyhCiUXN2_vdNRLTdjenDjfttjVYIU6tvzEdJpg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/JCDL.2019.00076
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728115477
9781728115474
EndPage 373
ExternalDocumentID 8791168
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a247t-acf2034dd7003f36cbe631923334c093777ff3fad046b862bbaa0c9c7b92ccf63
IEDL.DBID RIE
IngestDate Wed Aug 27 02:54:29 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-acf2034dd7003f36cbe631923334c093777ff3fad046b862bbaa0c9c7b92ccf63
PageCount 2
ParticipantIDs ieee_primary_8791168
PublicationCentury 2000
PublicationDate 2019-Jun
PublicationDateYYYYMMDD 2019-06-01
PublicationDate_xml – month: 06
  year: 2019
  text: 2019-Jun
PublicationDecade 2010
PublicationTitle 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)
PublicationTitleAbbrev JCDL
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib040743256
Score 1.7047151
Snippet A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are...
SourceID ieee
SourceType Publisher
StartPage 372
SubjectTerms tagging
tagging evaluation
topic modeling
Title How to Measure the Consistency of the Tagging of Scientific Papers?
URI https://ieeexplore.ieee.org/document/8791168
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkyAWsS3PDCSNont2J4YClVVUdShlbpV_joGpKaCRAh-PbbTFoQY2BIvTmKf3vPl3TuEbpRjlntcTVJudUKZIoliKktAg2WGSSUhun0-FeMFnSzZsoVu97UwzrkoPnP9cBn_5dvS1CFVNhDch2Yh2qjtt1lTq7XbOzRAoYfvrXtPlsrBZHj_GLRbwZAyDZ4iP9qnRPQYHaLpbt5GNPLSryvdN5-_LBn_-2BHqPddp4dnewQ6Ri237qLhuHzHVYmnTfYPe4aHY1vOt0CPP3AJcWiuQq75OdzG-I6aITxTG88H73poMXqYD8fJtlVConLKq0QZyFNCreU-SoEURruCBPJGCDWppyCcAxBQ1h-HtT_EaK1UaqThWubGQEFOUGddrt0pwj7KdS6YtgCUcjBacJMV0gkpMgaZOEPd8AFWm8YNY7V99_O_hy_QQViCRlx1iTrVa-2uPIxX-jqu3xcOGp3X
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED2VMsAEqEV844GRtElsx8nEUKhCaasOrdSt8icDUlNBIgS_HttpC0IMbLGXOHFO7_ny7h3ADddUMYurQciUCAjlOOCUR4ERRlFJM54Z7_Y5TvIZGczpvAG321oYrbUXn-mOu_T_8lUhK5cq66bMhmaS7sCuxX1C62qtzddDHBhaAF_790Rh1h307odOveUsKUPnKvKjgYrHj_4BjDZ3rmUjL52qFB35-cuU8b9LO4T2d6Uemmwx6AgaetmCXl68o7JAozr_hyzHQ74x55sjyB-oMH5qyl22-dkNfYR71RCa8JVlhHdtmPUfpr08WDdLCHhMWBlwaeIQE6WYjVODEyl0gh19w5jI0JIQxozBhit7IBb2GCME56HMJBNZLKVJ8DE0l8VSnwCycS7ilAplDCHMSJEyGSWZTrM0oiZKT6HlXsBiVfthLNbPfvb39DXs5dPRcDF8HD-dw77bjlpqdQHN8rXSlxbUS3Hl9_ILhIOhJA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+ACM%2FIEEE+Joint+Conference+on+Digital+Libraries+%28JCDL%29&rft.atitle=How+to+Measure+the+Consistency+of+the+Tagging+of+Scientific+Papers%3F&rft.au=Veytsman%2C+Boris&rft.date=2019-06-01&rft.pub=IEEE&rft.spage=372&rft.epage=373&rft_id=info:doi/10.1109%2FJCDL.2019.00076&rft.externalDocID=8791168