Creating a model of semantic analysis of extremist texts in the Kazakh language
Presently, there is a significant emphasis on the utilization of semantic analysis to scrutinize texts and viewpoints expressed in the Kazakh language within the realm of social networks, with the primary objective of identifying content of a suspicious or extremist nature. This research article is...
Saved in:
Published in | Vestnik KazNU. Serii͡a︡ matematika, mekhanika, informatika Vol. 121; no. 1; pp. 110 - 121 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
09.04.2024
|
Online Access | Get full text |
Cover
Loading…
Summary: | Presently, there is a significant emphasis on the utilization of semantic analysis to scrutinize texts and viewpoints expressed in the Kazakh language within the realm of social networks, with the primary objective of identifying content of a suspicious or extremist nature. This research article is dedicated to exploring the application of machine learning and deep learning techniques in the realm of extremist content detection within textual data. The investigation takes into account several critical factors, including oversampling and undersampling during the feature processing phase, the nuanced differentiation between extremist and neutral subjects, and the handling of imbalanced classification challenges. These considerations culminate in the development of a sophisticated deep learning model for text classification. The study encompasses the deployment of various machine learning models to discern extremist content within textual materials. Additionally, a comprehensive comparative analysis of machine learning methodologies is conducted to ascertain the most effective approach for this task, taking into consideration oversampling and undersampling techniques for addressing data imbalance issues. The research endeavors are delineated into two core subtasks: the formulation of a machine learning model specialized in the detection of extremist content within text, and the construction of a deep learning model that factors in the unique characteristics of the Kazakh language and the available dataset. Furthermore, the study delves into the intricacies of feature processing, culminating in a comparative assessment of outcomes derived from a range of machine learning algorithms used to classify religious extremism, each leveraging distinct feature combinations. The methodologies explored encompass decision trees, random forests, support vector machines, k-nearest neighbors, logistic regression, and naive Bayes. This research significantly contributes to the spheres of text mining, artificial intelligence, and machine learning, offering practical recommendations for the processing and categorization of texts linked to religious extremism. Moreover, it underscores the contemporary significance of conducting semantic analyses on extremist texts written in the Kazakh language. |
---|---|
ISSN: | 1563-0277 2617-4871 |
DOI: | 10.26577/JMMCS2024121111 |