Constructing and validating word similarity datasets by integrating methods from psychology, brain science and computational linguistics

Human-scored word similarity gold-standard datasets are normally composed of word pairs with corresponding similarity scores. These datasets are popular resources for evaluating word similarity models which are the essential components for many natural language processing tasks. This paper proposes...

Full description

Saved in:
Bibliographic Details
Published inSoft computing (Berlin, Germany) Vol. 22; no. 21; pp. 6967 - 6979
Main Authors Wan, Yu, Chen, Yidong, Shi, Xiaodong, Zhou, Changle
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2018
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Human-scored word similarity gold-standard datasets are normally composed of word pairs with corresponding similarity scores. These datasets are popular resources for evaluating word similarity models which are the essential components for many natural language processing tasks. This paper proposes a novel multidisciplinary method for constructing and validating word similarity gold-standard datasets. The proposed method is different from the previous ones in that it introduces methods from three different disciplines, i.e., psychology, brain science and computational linguistics to validate the soundness of the constructed datasets. Specifically, to the best of our knowledge, this is the first time event-related potentials experiments are incorporated to validate the word similarity datasets. Using the proposed method, we finally constructed a Chinese gold-standard word similarity dataset with 260 word pairs and showed its soundness using the interdisciplinary validating methods. It should be noted that, although the paper only focused on constructing Chinese standard dataset, the proposed method is applicable to other languages.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1432-7643
1433-7479
DOI:10.1007/s00500-018-3174-1