Multilingual Sentiment Analysis and Toxicity Detection for Text Messages in Russian

In this paper, we discuss an approach to sentiment analysis and emotion identification for user comments. The solution is threefold: 1) topic detection, 2) sentiment evaluation, 3) toxicity detection and toxic spans localization. The lack of significantly large training data for the Russian language...

Full description

Saved in:
Bibliographic Details
Published in2021 29th Conference of Open Innovations Association (FRUCT) Vol. 29; no. 1; pp. 55 - 64
Main Authors Bogoradnikova, Darya, Makhnytkina, Olesia, Matveev, Anton, Zakharova, Anastasia, Akulov, Artem
Format Conference Proceeding Journal Article
LanguageEnglish
Published FRUCT 12.05.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we discuss an approach to sentiment analysis and emotion identification for user comments. The solution is threefold: 1) topic detection, 2) sentiment evaluation, 3) toxicity detection and toxic spans localization. The lack of significantly large training data for the Russian language is handled by utilizing multilingual word embeddings, the adversarial domain adaptation model, and data augmentation. We present an overview of various preprocessing pipelines for topic modeling and highlight the LDA- Mallet model which demonstrates the best performance. For sentiment analysis and toxicity detection, we examine the efficacy of a support vector machine and a deep neural network with a multilingual language model and adversarial domain adaptation that allows us to train algorithms with datasets in the English language. All methods are tested with a dataset of user comments to various online-courses and adjusted to provide support for the development of a virtual dialogue assistant for conducting virtual exams.
ISSN:2305-7254
2343-0737
DOI:10.23919/FRUCT52173.2021.9435584