How to Measure the Consistency of the Tagging of Scientific Papers?

A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics o...

Full description

Saved in:

Bibliographic Details
Published in	2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) pp. 372 - 373
Main Author	Veytsman, Boris
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2019
Subjects	tagging tagging evaluation topic modeling
Online Access	Get full text
DOI	10.1109/JCDL.2019.00076

Cover

More Information
Summary:	A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics of tagging consistency for scientific papers: whether these tags are predictive of citations. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging should be able to predict citations. We present an algorithm to calculate consistency, and show experiments with human-and machine-generated tags. We show that the addition of machine-generated tags to the manual ones can enhance tagging consistency. We further introduce cross-consistency metrics, the ability to predict citation links between papers tagged by different taggers, e.g. humans and computers. Cross-consistency metrics can be used to evaluate tagging quality of a tagger when the amount of labeled data by the known good tagger is limited.
DOI:	10.1109/JCDL.2019.00076