Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value
Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend...
Saved in:
Published in | Advances in Natural Language Processing pp. 71 - 83 |
---|---|
Main Authors | , , |
Format | Book Chapter Conference Proceeding |
Language | English |
Published |
Cham
Springer International Publishing
2014
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 3319108875 9783319108872 9783319108889 3319108883 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-319-10888-9_8 |
Cover
Loading…
Summary: | Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend on corpus and strong assumptions reflecting specific terminological practice. We tackle this problem by proposing a parametrised C-Value which optimally considers the length and the syntactic roles of the nested terms thanks to a genetic algorithm. We compare its impact on the ranking of terms extracted from three corpora. Results show average precision increased by 9% above the frequency-based ranking and by 12% above the C-Value-based ranking. |
---|---|
ISBN: | 3319108875 9783319108872 9783319108889 3319108883 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-10888-9_8 |