Annotation of Text Corpora by Sentiment and Irony in a Project of Citizen Science

— This paper studies the construction of a corpus of sentences annotated by general sentiment into four classes (positive, negative, neutral, and mixed), a corpus of phrasemes annotated by sentiment into three classes (positive, negative, and neutral), and a corpus of sentences annotated by the pres...

Full description

Saved in:
Bibliographic Details
Published inAutomatic control and computer sciences Vol. 58; no. 7; pp. 797 - 807
Main Authors Paramonov, I. V., Poletaev, A. Y.
Format Journal Article
LanguageEnglish
Published Moscow Pleiades Publishing 01.12.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0146-4116
1558-108X
DOI10.3103/S0146411624700263

Cover

Loading…
More Information
Summary:— This paper studies the construction of a corpus of sentences annotated by general sentiment into four classes (positive, negative, neutral, and mixed), a corpus of phrasemes annotated by sentiment into three classes (positive, negative, and neutral), and a corpus of sentences annotated by the presence or absence of irony. The annotation is conducted by volunteers within the project Preparing Texts for Algorithms on the People of Science website. Based on the available knowledge of the subject area for each of the problems, guidelines for the annotators are compiled. A methodology for the statistical processing of the annotation results is also developed based on analyzing the distributions and agreement measures of the annotations of different annotators. For annotating sentences by irony and phrasemes by sentiment, the agreement measures are quite high (the full agreement rate is 0.60–0.99), while for annotating sentences by general sentiment, the agreement is low (the full agreement rate is 0.40), apparently due to the higher complexity of the problem. It is also shown that the performance of automatic algorithms for sentence sentiment analysis improves by 12–13% when using a corpus on whose sentences all annotators (3–5 people) agree compared with a corpus annotated by only one volunteer.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0146-4116
1558-108X
DOI:10.3103/S0146411624700263