Distant Supervision for Relation Extraction via Sparse Representation

In relation extraction, distant supervision is proposed to automatically generate a large amount of labeled data. Distant supervision heuristically aligns the given knowledge base to free text and consider the alignment as labeled data. This procedure is effective to get training data. However, this...

Full description

Saved in:

Bibliographic Details
Published in	Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data pp. 151 - 162
Main Authors	Zeng, Daojian, Lai, Siwei, Wang, Xuepeng, Liu, Kang, Zhao, Jun, Lv, Xueqiang
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Feature Vector Label Data Natural Language Processing Noise Term Sparse Representation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In relation extraction, distant supervision is proposed to automatically generate a large amount of labeled data. Distant supervision heuristically aligns the given knowledge base to free text and consider the alignment as labeled data. This procedure is effective to get training data. However, this heuristically label procedure is confronted with wrong labels. Thus, the extracted features are noisy and cause poor extraction performance. In this paper, we exploit the sparse representation to address the noise feature problem. Given a new test feature vector, we first compute its sparse linear combination of all the training features. To reduce the influence of noise features, a noise term is adopted in the procedure of finding the sparse solution. Then, the residuals to each class are computed. Finally, we classify the test sample by assigning it to the object class that has minimal residual. Experimental results demonstrate that the noise term is effective to noise features and our approach significantly outperforms the state-of-the-art methods.
ISBN:	9783319122762 3319122762
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-12277-9_14