Outlier Detection on Semantic Space for Sentiment Analysis With Convolutional Neural Networks

Sentiment analysis is a text categorization problem that consists in automatically assigning text documents to pre- defined classes that represent sentiments or a positive/negative opinion about a subject. To solve this task, machine learning techniques can be used. However, in order to achieve good...

Full description

Saved in:
Bibliographic Details
Published in2018 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8
Main Authors Schmitt, Murilo Falleiros Lemos, Spinosa, Eduardo J.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2018
Subjects
Online AccessGet full text
ISSN2161-4407
DOI10.1109/IJCNN.2018.8489200

Cover

Loading…
More Information
Summary:Sentiment analysis is a text categorization problem that consists in automatically assigning text documents to pre- defined classes that represent sentiments or a positive/negative opinion about a subject. To solve this task, machine learning techniques can be used. However, in order to achieve good gen- eralization, these techniques require a thorough pre-processing and an apropriate data representation. To deal with these fundamental issues, this work proposes the use of convolutional neural networks and density-based clustering algorithms. The word representations used in this work were obtained from vectors previously trained in an unsupervised way, denominated word embeddings. These representations are able to capture syntactic and semantic information of words, which leads to similar words to be projected closer together in the semantic space. In this scenario, in order to improve the performance of the convolutional neural network, the use of a clustering algorithm in the semantic space to extract additional information from the data is proposed. A density-based clustering algorithm was used to detect and remove outliers from the documents to be classified before these documents were used to train the con- volutional neural network. We conducted experiments with two different embeddings across three datasets in order to validate the effectiveness of our method. Results show that removing outliers from documents is capable of slightly improving the accuracy of the model and reducing computational cost for the non-static training approach. (0)
ISSN:2161-4407
DOI:10.1109/IJCNN.2018.8489200