Sexual Violence Classification as Hate Speech using Indonesian Tweet
Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by...
Saved in:
Published in | 2022 International Symposium on Information Technology and Digital Innovation (ISITDI) pp. 114 - 120 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
27.07.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by sexual violence cases spread through online media. Our research aims to produce the best classification model with high accuracy by comparing several combinations of machine learning techniques. We collected 9, 035 twitter user opinions to be used as a dataset. From a total of 6, 089 opinions that were successfully annotated, 5, 102 opinions were classified as non-hate speech and 987 opinions as hate speech. We purpose SVM model classification with TF-IDF (Unigram) as feature extraction method and Oversampling method such as ROS and SMOTE to solve imbalance dataset problem and improve the performance of model classification. The classification model with SVM algorithm reach the best accuracy, which is 0.942 with F1-score of 0.940. |
---|---|
DOI: | 10.1109/ISITDI55734.2022.9944482 |