Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge

One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of sup...

Full description

Saved in:
Bibliographic Details
Published inInductive Logic Programming Vol. 9046; pp. 92 - 107
Main Authors Natarajan, Sriraam, Picado, Jose, Khot, Tushar, Kersting, Kristian, Re, Christopher, Shavlik, Jude
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 01.01.2015
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783319237077
3319237071
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-23708-4_7

Cover

Loading…
More Information
Summary:One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand-coded background knowledge that explicitly looks for patterns in text. For example, they assume all sentences containing Person X and Person Y are positive examples of the relation married(X, Y). In this work, we take a different approach – we infer weakly supervised examples for relations from models learned by using knowledge outside the natural language task. We argue that this method creates more robust examples that are particularly useful when learning the entire information-extraction model (the structure and parameters). We demonstrate on three domains that this form of weak supervision yields superior results when learning structure compared to using distant supervision labels or a smaller set of gold-standard labels.
ISBN:9783319237077
3319237071
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-23708-4_7