TRSAv1: A new benchmark dataset for classifying user reviews on Turkish e-commerce websites

The amount of data produced significantly increased with the development of Internet technologies. Accordingly, the importance of natural language processing studies increased, and this topic became one of the most studied artificial intelligence subjects. Even though it is a popular topic that is w...

Full description

Saved in:
Bibliographic Details
Published inJournal of information science Vol. 49; no. 6; pp. 1711 - 1725
Main Authors Aydoğan, Murat, Kocaman, Veysel
Format Journal Article
LanguageEnglish
Published London, England SAGE Publications 01.12.2023
Bowker-Saur Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The amount of data produced significantly increased with the development of Internet technologies. Accordingly, the importance of natural language processing studies increased, and this topic became one of the most studied artificial intelligence subjects. Even though it is a popular topic that is widely studied on, not enough studies have been conducted on the Turkish language. Even the studies conducted in Turkey are primarily on English and other natural languages instead of Turkish. The lack of a Turkish dataset is the most crucial reason for the lack of studies. Therefore, to create a solution, user reviews on e-commerce websites were collected and labelled reviews as positive, negative and neutral, and a new and unique dataset consisting of 150,000 reviews was created. This dataset was named TRSAv1, which was publicly shared with the researchers will contribute to the Turkish natural language processing studies; however, the effect of different word representation methods on algorithm performance was examined in detail, and the results were compared.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0165-5515
1741-6485
DOI:10.1177/01655515221074328