Sexual Violence Classification as Hate Speech using Indonesian Tweet

Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by...

Full description

Saved in:
Bibliographic Details
Published in2022 International Symposium on Information Technology and Digital Innovation (ISITDI) pp. 114 - 120
Main Authors Ramadhan, Muammar Notareza, Budi, Indra, Santoso, Aris Budi, Suryono, Ryan Randy
Format Conference Proceeding
LanguageEnglish
Published IEEE 27.07.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Hate speech is an action in the form of communication either directly or through the media performed by groups or individuals with the aim of provoking, inciting, or insulting a group or other individuals. 3, 640 hate speech spread across various social media. 677 KBGO cases, which were dominated by sexual violence cases spread through online media. Our research aims to produce the best classification model with high accuracy by comparing several combinations of machine learning techniques. We collected 9, 035 twitter user opinions to be used as a dataset. From a total of 6, 089 opinions that were successfully annotated, 5, 102 opinions were classified as non-hate speech and 987 opinions as hate speech. We purpose SVM model classification with TF-IDF (Unigram) as feature extraction method and Oversampling method such as ROS and SMOTE to solve imbalance dataset problem and improve the performance of model classification. The classification model with SVM algorithm reach the best accuracy, which is 0.942 with F1-score of 0.940.
DOI:10.1109/ISITDI55734.2022.9944482