Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value

Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend...

Full description

Saved in:
Bibliographic Details
Published inAdvances in Natural Language Processing pp. 71 - 83
Main Authors Hamon, Thierry, Engström, Christopher, Silvestrov, Sergei
Format Book Chapter Conference Proceeding
LanguageEnglish
Published Cham Springer International Publishing 2014
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3319108875
9783319108872
9783319108889
3319108883
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-10888-9_8

Cover

Loading…
More Information
Summary:Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend on corpus and strong assumptions reflecting specific terminological practice. We tackle this problem by proposing a parametrised C-Value which optimally considers the length and the syntactic roles of the nested terms thanks to a genetic algorithm. We compare its impact on the ranking of terms extracted from three corpora. Results show average precision increased by 9% above the frequency-based ranking and by 12% above the C-Value-based ranking.
ISBN:3319108875
9783319108872
9783319108889
3319108883
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-10888-9_8