Toxic Comment Classification Using S-BERT Vectorization and Random Forest Algorithm

The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Contemporary Computing and Communications (InC4) Vol. 1; pp. 1 - 6
Main Authors Kumar, Aparna Ashok, Pati, Peeta Basa, Deepa, K., Sangeetha, S.Tresa
Format Conference Proceeding
LanguageEnglish
Published IEEE 21.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this issue, it is important for social media systems to be able to recognize harmful comments. With the rising incidence of cyberbullying, it is crucial to study the classification of toxic comments using various algorithms. This study compares the effectiveness of different word and sentence embedding methods, including TF-IDF, InferSent, Bert, and T5 for toxic comments classification. A comparative study is also conducted on the impact of using SMOTE to balance the highly imbalanced dataset. The results of these models are compared and analysed. It is observed that T5 embedding with Random Forest Classifier works best at 0.91 F1-Score.
DOI:10.1109/InC457730.2023.10263218