Cross-Domain Keyword Extraction with Keyness Patterns
Domain dependence and annotation subjectivity pose challenges for supervised keyword extraction. Based on the premises that second-order keyness patterns are existent at the community level and learnable from annotated keyword extraction datasets, this paper proposes a supervised ranking approach to...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Domain dependence and annotation subjectivity pose challenges for supervised
keyword extraction. Based on the premises that second-order keyness patterns
are existent at the community level and learnable from annotated keyword
extraction datasets, this paper proposes a supervised ranking approach to
keyword extraction that ranks keywords with keyness patterns consisting of
independent features (such as sublanguage domain and term length) and three
categories of dependent features -- heuristic features, specificity features,
and representavity features. The approach uses two convolutional-neural-network
based models to learn keyness patterns from keyword datasets and overcomes
annotation subjectivity by training the two models with bootstrap sampling
strategy. Experiments demonstrate that the approach not only achieves
state-of-the-art performance on ten keyword datasets in general supervised
keyword extraction with an average top-10-F-measure of 0.316 , but also robust
cross-domain performance with an average top-10-F-measure of 0.346 on four
datasets that are excluded in the training process. Such cross-domain
robustness is attributed to the fact that community-level keyness patterns are
limited in number and temperately independent of language domains, the
distinction between independent features and dependent features, and the
sampling training strategy that balances excess risk and lack of negative
training data. |
---|---|
DOI: | 10.48550/arxiv.2409.18724 |