Leveraging Large Data with Weak Supervision for Joint Feature and Opinion Word Extraction

Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large-scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak s...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 30; no. 4; pp. 903 - 916
Main Author 房磊 刘彪 黄民烈
Format Journal Article
LanguageEnglish
Published New York Springer US 01.07.2015
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large-scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two- fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures; second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.
Bibliography:Lei Fang ,Biao Liu , Min-Lie Huang(State Key Laboratory on Intelligent Technology and Systems, Department of Computer Science and Technology Tsinghua University, Beijing 100084, China)
11-2296/TP
Product feature and opinion word extraction is very important for fine granular sentiment analysis. In this paper, we leverage large-scale unlabeled data for joint extraction of feature and opinion words under a knowledge poor setting, in which only a few feature-opinion pairs are utilized as weak supervision. Our major contributions are two- fold: first, we propose a data-driven approach to represent product features and opinion words as a list of corpus-level syntactic relations, which captures rich language structures; second, we build a simple yet robust unsupervised model with prior knowledge incorporated to extract new feature and opinion words, which obtains high performance robustly. The extraction process is based upon a bootstrapping framework which, to some extent, reduces error propagation under large data. Experimental results under various settings compared with state-of-the-art baselines demonstrate that our method is effective and promising.
opinion mining, sentiment analysis, prior knowledge, feature extraction
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-015-1569-3