Multilingual Sentiment Analysis and Toxicity Detection for Text Messages in Russian

In this paper, we discuss an approach to sentiment analysis and emotion identification for user comments. The solution is threefold: 1) topic detection, 2) sentiment evaluation, 3) toxicity detection and toxic spans localization. The lack of significantly large training data for the Russian language...

Full description

Saved in:

Bibliographic Details
Published in	2021 29th Conference of Open Innovations Association (FRUCT) Vol. 29; no. 1; pp. 55 - 64
Main Authors	Bogoradnikova, Darya, Makhnytkina, Olesia, Matveev, Anton, Zakharova, Anastasia, Akulov, Artem
Format	Conference Proceeding Journal Article
Language	English
Published	FRUCT 12.05.2021
Subjects	Adaptation models multilingual model Pipelines Sentiment analysis Support vector machines Technological innovation topic modeling toxicity detection Toxicology Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we discuss an approach to sentiment analysis and emotion identification for user comments. The solution is threefold: 1) topic detection, 2) sentiment evaluation, 3) toxicity detection and toxic spans localization. The lack of significantly large training data for the Russian language is handled by utilizing multilingual word embeddings, the adversarial domain adaptation model, and data augmentation. We present an overview of various preprocessing pipelines for topic modeling and highlight the LDA- Mallet model which demonstrates the best performance. For sentiment analysis and toxicity detection, we examine the efficacy of a support vector machine and a deep neural network with a multilingual language model and adversarial domain adaptation that allows us to train algorithms with datasets in the English language. All methods are tested with a dataset of user comments to various online-courses and adjusted to provide support for the development of a virtual dialogue assistant for conducting virtual exams.
ISSN:	2305-7254 2343-0737
DOI:	10.23919/FRUCT52173.2021.9435584