How to Measure the Consistency of the Tagging of Scientific Papers?
A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics o...
Saved in:
Published in | 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) pp. 372 - 373 |
---|---|
Main Author | |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2019
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/JCDL.2019.00076 |
Cover
Summary: | A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics of tagging consistency for scientific papers: whether these tags are predictive of citations. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging should be able to predict citations. We present an algorithm to calculate consistency, and show experiments with human-and machine-generated tags. We show that the addition of machine-generated tags to the manual ones can enhance tagging consistency. We further introduce cross-consistency metrics, the ability to predict citation links between papers tagged by different taggers, e.g. humans and computers. Cross-consistency metrics can be used to evaluate tagging quality of a tagger when the amount of labeled data by the known good tagger is limited. |
---|---|
DOI: | 10.1109/JCDL.2019.00076 |