Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value

Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend...

Full description

Saved in:

Bibliographic Details
Published in	Advances in Natural Language Processing pp. 71 - 83
Main Authors	Hamon, Thierry, Engström, Christopher, Silvestrov, Sergei
Format	Book Chapter Conference Proceeding
Language	English
Published	Cham Springer International Publishing 2014
Series	Lecture Notes in Computer Science
Subjects	genetic algorithm matematik/tillämpad matematik Mathematics/Applied Mathematics term extraction term ranking Terminology term extraction genetic algorithm term ranking Terminology
Online Access	Get full text
ISBN	3319108875 9783319108872 9783319108889 3319108883
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-319-10888-9_8

Cover

Loading…

More Information
Summary:	Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend on corpus and strong assumptions reflecting specific terminological practice. We tackle this problem by proposing a parametrised C-Value which optimally considers the length and the syntactic roles of the nested terms thanks to a genetic algorithm. We compare its impact on the ranking of terms extracted from three corpora. Results show average precision increased by 9% above the frequency-based ranking and by 12% above the C-Value-based ranking.
ISBN:	3319108875 9783319108872 9783319108889 3319108883
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-10888-9_8