A Benchmark and Scoring Algorithm for Enriching Arabic Synonyms
The 12th International Global Wordnet Conference (GWC2023), Global Wordnet Association. (pp. ). San Sebastian, Spain, 2023 This paper addresses the task of extending a given synset with additional synonyms taking into account synonymy strength as a fuzzy value. Given a mono/multilingual synset and a...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
04.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The 12th International Global Wordnet Conference (GWC2023), Global
Wordnet Association. (pp. ). San Sebastian, Spain, 2023 This paper addresses the task of extending a given synset with additional
synonyms taking into account synonymy strength as a fuzzy value. Given a
mono/multilingual synset and a threshold (a fuzzy value [0-1]), our goal is to
extract new synonyms above this threshold from existing lexicons. We present
twofold contributions: an algorithm and a benchmark dataset. The dataset
consists of 3K candidate synonyms for 500 synsets. Each candidate synonym is
annotated with a fuzzy value by four linguists. The dataset is important for
(i) understanding how much linguists (dis/)agree on synonymy, in addition to
(ii) using the dataset as a baseline to evaluate our algorithm. Our proposed
algorithm extracts synonyms from existing lexicons and computes a fuzzy value
for each candidate. Our evaluations show that the algorithm behaves like a
linguist and its fuzzy values are close to those proposed by linguists (using
RMSE and MAE). The dataset and a demo page are publicly available at
https://portal.sina.birzeit.edu/synonyms. |
---|---|
DOI: | 10.48550/arxiv.2302.02232 |