Annotation of Text Corpora by Sentiment and Irony in a Project of Citizen Science
— This paper studies the construction of a corpus of sentences annotated by general sentiment into four classes (positive, negative, neutral, and mixed), a corpus of phrasemes annotated by sentiment into three classes (positive, negative, and neutral), and a corpus of sentences annotated by the pres...
Saved in:
Published in | Automatic control and computer sciences Vol. 58; no. 7; pp. 797 - 807 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Moscow
Pleiades Publishing
01.12.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 0146-4116 1558-108X |
DOI | 10.3103/S0146411624700263 |
Cover
Loading…
Summary: | —
This paper studies the construction of a corpus of sentences annotated by general sentiment into four classes (positive, negative, neutral, and mixed), a corpus of phrasemes annotated by sentiment into three classes (positive, negative, and neutral), and a corpus of sentences annotated by the presence or absence of irony. The annotation is conducted by volunteers within the project Preparing Texts for Algorithms on the People of Science website. Based on the available knowledge of the subject area for each of the problems, guidelines for the annotators are compiled. A methodology for the statistical processing of the annotation results is also developed based on analyzing the distributions and agreement measures of the annotations of different annotators. For annotating sentences by irony and phrasemes by sentiment, the agreement measures are quite high (the full agreement rate is 0.60–0.99), while for annotating sentences by general sentiment, the agreement is low (the full agreement rate is 0.40), apparently due to the higher complexity of the problem. It is also shown that the performance of automatic algorithms for sentence sentiment analysis improves by 12–13% when using a corpus on whose sentences all annotators (3–5 people) agree compared with a corpus annotated by only one volunteer. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0146-4116 1558-108X |
DOI: | 10.3103/S0146411624700263 |