Learning to extract domain-specific relations from complex sentences

•We propose SemIE, Semantic-based Information Extraction and Mapping.•Our approach identifies significant relations and maps them to a semantic structure.•Our approach bootstraps training examples from a pair of structured documents.•The results show our approach outperforms current state-of-the-art...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 60; pp. 107 - 117
Main Authors Tan, Saravadee Sae, Lim, Tek Yong, Soon, Lay-Ki, Tang, Enya Kong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 30.10.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•We propose SemIE, Semantic-based Information Extraction and Mapping.•Our approach identifies significant relations and maps them to a semantic structure.•Our approach bootstraps training examples from a pair of structured documents.•The results show our approach outperforms current state-of-the-art system.•The results prove the effectiveness of our approach in handling complex sentences. Open Information Extraction (OIE) systems focus on identifying and extracting general relations from text. Most OIE systems utilize simple linguistic structure, such as part-of-speech or dependency features, to extract relations and arguments from a sentence. These approaches are simple and fast to implement, but suffer from two main drawbacks: i) they are less effective to handle complex sentences with multiple relations and shared arguments, and ii) they tend to extract overly-specific relations. This paper proposes an approach to Information Extraction called SemIE, which addresses both drawbacks. SemIE identifies significant relations from domain-specific text by utilizing a semantic structure that describes the domain of discourse. SemIE exploits the predicate-argument structure of a text, which is able to handle complex sentences. The semantics of the arguments are explicitly specified by mapping them to relevant concepts in the semantic structure. SemIE uses a semi-supervised learning approach to bootstrap training examples that cover all relations expressed in the semantic structure. SemIE inputs pairs of structured documents and uses a Greedy Mapping module to bootstrap a full set of training examples. The training examples are then used to learn the extraction and mapping rules. We evaluated the performance of SemIE by comparing it with OLLIE, a state-of-the-art OIE system. We tested SemIE and OLLIE on the task of extracting relations from text in the “movie” domain and found that on average, SemIE outperforms OLLIE. Furthermore, we also examined how the performance varies with sentence complexity and sentence length. The results prove the effectiveness of SemIE in handling complex sentences.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2016.05.004