Identification of Affective States Based on Automatic Analysis of Texts of Comments in Social Networks

The paper considers the problem of classifying 3553 English-language comments from the social network Reddit based on various approaches to the vectorization of comment texts, including bag of words, TF–IDF, bigrams analysis based on pointwise mutual information (PMI) and sentiments, and the deep mo...

Full description

Saved in:
Bibliographic Details
Published inAutomation and remote control Vol. 83; no. 12; pp. 1877 - 1885
Main Author Dyulicheva, Yu. Yu
Format Journal Article
LanguageEnglish
Published Moscow Pleiades Publishing 01.12.2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The paper considers the problem of classifying 3553 English-language comments from the social network Reddit based on various approaches to the vectorization of comment texts, including bag of words, TF–IDF, bigrams analysis based on pointwise mutual information (PMI) and sentiments, and the deep model BERT of the language representation. The use of a hybrid approach based on text vectorization using BERT and bigrams analysis have made it possible to improve the quality of comments classification up to 91%. Based on a cluster analysis of 1857 English-language comments describing anxiety, clusters were identified using BERT+k-means. The study proposes a hybrid approach based on the use of the LDA topic modeling method, the VADER sentiments analysis method, pointwise mutual information, and parts of speech analysis and permitting one to select bigrams and trigrams to describe clusters of comments. To visualize the extracted patterns in the form of trigrams, a knowledge graph was constructed that describes the subject area, and a comparison of the words of the selected target trigrams with the words of a custom dictionary describing various affective disorders has made it possible to determine the types of psychosocial stressors associated with affective disorders.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0005-1179
1608-3032
DOI:10.1134/S00051179220120025