A hybrid model to improve IC-related metrics of semantic similarity between words
This paper proposes a hybrid model to improve Information Content (IC) related metrics of semantic similarity between words, named IC+SP , based on the essential hypothesis that IC and the shortest path are two relatively independent semantic evidences and have approximately equal influences to the...
Saved in:
Published in | Complex & intelligent systems Vol. 10; no. 5; pp. 6339 - 6377 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
01.10.2024
Springer Nature B.V Springer |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper proposes a hybrid model to improve Information Content (IC) related metrics of semantic similarity between words, named
IC+SP
, based on the essential hypothesis that IC and the shortest path are two relatively independent semantic evidences and have approximately equal influences to the semantic similarity metric. The paradigm of
IC+SP
is to linearly combine the IC-related metric and the shortest path. Meanwhile, a transformation from the semantic similarity of the concepts to that of the words is presented by maximizing every component of
IC+SP
. 13 improved IC-related metrics based on
IC+SP
are formed and implemented on the experimental platform HESML Lastra-Díaz (Inf Syst 66:97–118, 2017). Pearson’s and Spearman’s correlation coefficients on well-accepted benchmarks for the improved metrics compare to those for the original ones to evaluate
IC+SP
. I introduce the Wilcoxon Signed-Rank Test needing no standard distribution hypothesis, while, this hypothesis is required by T-Test on the sample of small size. T-Test, as well as the Wilcoxon Signed-Rank Test, conduct on the differences of the correlative coefficients for improved and original metrics. It is expected that the improved IC-related metrics could significantly outperform their corresponding original ones, and the experimental results, including the comparisons of mean and maximum of correlation coefficients as well as the
p
-value and confidence interval of both tests, accomplish the anticipation in the vast majority of cases. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2199-4536 2198-6053 |
DOI: | 10.1007/s40747-024-01496-y |