A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RF

The rapid development of the Internet has facilitated expression, sharing, and interaction on social networks, but some speech may contain harmful discrimination. Therefore, it is crucial to classify such speech. In this paper, we collected discriminatory data from Sina Weibo and propose the improve...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 14; no. 15; p. 6468
Main Authors	Wu, Chao, Hu, Huijuan, Zhu, Dingju, Shan, Xilin, Yung, Kai-Leung, Ip, Andrew W. H.
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.08.2024
Subjects	Accuracy Algorithms Classification Computational linguistics Data mining Datasets Dictionaries discrimination speech Hate speech Information management integration method Internet Language processing latent Dirichlet allocation Machine learning Maximum entropy method Natural language interfaces Natural language processing Probability distribution random forest Social media Social networks Sparsity support vector machine Text categorization China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The rapid development of the Internet has facilitated expression, sharing, and interaction on social networks, but some speech may contain harmful discrimination. Therefore, it is crucial to classify such speech. In this paper, we collected discriminatory data from Sina Weibo and propose the improved Synthetic Minority Over-sampling Technique (SMOTE) algorithm based on Latent Dirichlet Allocation (LDA) to improve data quality and balance. And we propose a new integration method integrating Support Vector Machine (SVM) and Random Forest (RF). The experimental results demonstrate that the integrated model exhibits enhanced precision, recall, and F1 score by 6.0%, 5.4%, and 5.7%, respectively, in comparison with SVM alone. Moreover, it exhibits the best performance in comparison with other machine learning methods. Furthermore, the positive impact of improved SMOTE and this integrated method on model classification is also confirmed in ablation experiments.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app14156468