An Investigation into the Role of Domain-Knowledge on the Use of Embeddings

Computing similarity in high-dimensional vector spaces is a long-standing problem that has recently seen significant progress with the invention of the word2vec algorithm. Usually, it has been found that using an embedded representation results in much better performance for the task being addressed...

Full description

Saved in:

Bibliographic Details
Published in	Inductive Logic Programming Vol. 10759; pp. 169 - 183
Main Authors	Vig, Lovekesh, Srinivasan, Ashwin, Bain, Michael, Verma, Ankit
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 01.01.2018 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Background Predicates Baseline Representation Deep Network Insufficient Background Sufficient Background
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Computing similarity in high-dimensional vector spaces is a long-standing problem that has recently seen significant progress with the invention of the word2vec algorithm. Usually, it has been found that using an embedded representation results in much better performance for the task being addressed. It is not known whether embeddings can similarly improve performance with data of the kind considered by Inductive Logic Programming (ILP), in which data apparently dissimilar on the surface, can be similar to each other given domain (background) knowledge. In this paper, using several ILP classification benchmarks, we investigate if embedded representations are similarly helpful for problems where there is sufficient amounts of background knowledge. We use tasks for which we have domain expertise about the relevance of background knowledge available and consider two subsets of background predicates (“sufficient” and “insufficient”). For each subset, we obtain a baseline representation consisting of Boolean-valued relational features. Next, a vector embedding specifically designed for classification is obtained. Finally, we examine the predictive performance of widely-used classification methods with and without the embedded representation. With sufficient background knowledge we find no statistical evidence for an improved performance with an embedded representation. With insufficient background knowledge, our results provide empirical evidence that for the specific case of using deep networks, an embedded representation could be useful.
ISBN:	9783319780894 3319780891
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-78090-0_12