Automatic Creation of a Domain Specific Thesaurus Using Siamese Networks

Recent trends have increasingly indicated a shift in search technologies across all applications from syntactic and lexical matching approaches to semantic methods, aiming to understand the intent and contextual meaning of search queries, in order to yield more relevant and accurate results. Such me...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE 15th International Conference on Semantic Computing (ICSC) pp. 355 - 361
Main Authors Dhaliwal, Mehak Preet, Tiwari, Hemant, Vala, Vanraj
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.01.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recent trends have increasingly indicated a shift in search technologies across all applications from syntactic and lexical matching approaches to semantic methods, aiming to understand the intent and contextual meaning of search queries, in order to yield more relevant and accurate results. Such methods often rely on semantic ontologies to map query words to concepts and aid in expansion. However, most applications require a domain specific language definition in order to overcome issues of ambiguity and misinterpretation of meaning. General purpose ontologies are often lacking in this purpose and fail to yield appropriate results in specific applications. In this paper, we propose a novel method of building a domain specific thesaurus for aiding semantic search through automatically creating a refined general thesaurus, followed by training a Siamese Network in two phases to classify candidate synonyms as relevant or non-relevant to the particular domain. We focus on the application of tag-based gallery image retrieval and extract and utilise information from Google's Conceptual Captions dataset in order to improve our model's performance. In order to investigate and justify our training method and architecture, we conduct an ablation study and compare results with our model. We further analytically and empirically demonstrate the advantage of representing terms in a domain-specific environment through semantic vectors fine-tuned on corpora related to the domain. Although our experiments are focused on building a word ontology specific to image retrieval, our method is generic and can be generalised to any field requiring a domain specific semantic language.
DOI:10.1109/ICSC50631.2021.00066