Using unsupervised information to improve semi-supervised tweet sentiment classification

Supervised algorithms require a set of representative labeled data for building classification models. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses both labeled and unlabeled data in the trai...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 355-356; pp. 348 - 365
Main Authors da Silva, Nádia Félix Felipe, Coletta, Luiz F.S., Hruschka, Eduardo R., Hruschka Jr, Estevam R.
Format Journal Article
LanguageEnglish
Published Elsevier Inc 10.08.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Supervised algorithms require a set of representative labeled data for building classification models. However, labeled data are usually difficult and expensive to obtain, which motivates the interest in semi-supervised learning. This type of learning uses both labeled and unlabeled data in the training process and is particularly useful in applications such as tweet sentiment analysis, where a large amount of unlabeled data is available. Semi-supervised learning for tweet sentiment analysis, although quite appealing, is relatively new. We propose a semi-supervised learning framework that combines unsupervised information, captured from a similarity matrix constructed from unlabeled data, with a classifier. Our motivation is that such a similarity matrix is a powerful knowledge-discovery tool that can help classify unlabeled tweet sets. Our framework makes use of the well-known Self-training algorithm to induce a better tweet sentiment classifier. Experimental results in real-world datasets demonstrate that the proposed framework can improve the accuracy of tweet sentiment analysis.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2016.02.002