Multilingual Sentiment Analysis and Toxicity Detection for Text Messages in Russian
In this paper, we discuss an approach to sentiment analysis and emotion identification for user comments. The solution is threefold: 1) topic detection, 2) sentiment evaluation, 3) toxicity detection and toxic spans localization. The lack of significantly large training data for the Russian language...
Saved in:
Published in | 2021 29th Conference of Open Innovations Association (FRUCT) Vol. 29; no. 1; pp. 55 - 64 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
FRUCT
12.05.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we discuss an approach to sentiment analysis and emotion identification for user comments. The solution is threefold: 1) topic detection, 2) sentiment evaluation, 3) toxicity detection and toxic spans localization. The lack of significantly large training data for the Russian language is handled by utilizing multilingual word embeddings, the adversarial domain adaptation model, and data augmentation. We present an overview of various preprocessing pipelines for topic modeling and highlight the LDA- Mallet model which demonstrates the best performance. For sentiment analysis and toxicity detection, we examine the efficacy of a support vector machine and a deep neural network with a multilingual language model and adversarial domain adaptation that allows us to train algorithms with datasets in the English language. All methods are tested with a dataset of user comments to various online-courses and adjusted to provide support for the development of a virtual dialogue assistant for conducting virtual exams. |
---|---|
ISSN: | 2305-7254 2343-0737 |
DOI: | 10.23919/FRUCT52173.2021.9435584 |