Development of a Brazilian Portuguese Hotel’s Reviews Corpus

The provision of voluntary textual information mediated by the Internet, and particularly by Web 2.0, provided an opportunity for the creation of large linguistic corpora. These corpora can serve as a fundamental resource for the development of applications focused on natural language, especially th...

Full description

Saved in:

Bibliographic Details
Published in	Computational Processing of the Portuguese Language pp. 353 - 361
Main Authors	de Souza, Joana Gabriela Ribeiro, de Paiva Oliveira, Alcione, Moreira, Alexandra
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Hotel’s reviews Linguistic corpus Portuguese corpus Sentiment analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The provision of voluntary textual information mediated by the Internet, and particularly by Web 2.0, provided an opportunity for the creation of large linguistic corpora. These corpora can serve as a fundamental resource for the development of applications focused on natural language, especially those using deep learning techniques that require big datasets. One type of application that benefits from these resources is the ones that perform sentiment analysis. This article describes the creation of corpus aimed to support sentiment analysis applications. It consists of reviews hotels located in the Brazilian capitals and the Federal District, written in Brazilian Portuguese language. The reviews that make up the corpus have been taken from TripAdvisor and have undergone normalization and POS tagging. The primary goal is to make it available to the community to be used in machine learning tasks geared toward natural language.
ISBN:	9783319997216 3319997211
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-99722-3_36