Toxic Comment Classification Using S-BERT Vectorization and Random Forest Algorithm

The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this...

Full description

Saved in:

Bibliographic Details
Published in	2023 IEEE International Conference on Contemporary Computing and Communications (InC4) Vol. 1; pp. 1 - 6
Main Authors	Kumar, Aparna Ashok, Pati, Peeta Basa, Deepa, K., Sangeetha, S.Tresa
Format	Conference Proceeding
Language	English
Published	IEEE 21.04.2023
Subjects	Analytical models Blogs Classification comments Computational modeling Cyberbullying Deep learning InferSent Measurement Oral communication S-Bert TF-IDF toxic
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this issue, it is important for social media systems to be able to recognize harmful comments. With the rising incidence of cyberbullying, it is crucial to study the classification of toxic comments using various algorithms. This study compares the effectiveness of different word and sentence embedding methods, including TF-IDF, InferSent, Bert, and T5 for toxic comments classification. A comparative study is also conducted on the impact of using SMOTE to balance the highly imbalanced dataset. The results of these models are compared and analysed. It is observed that T5 embedding with Random Forest Classifier works best at 0.91 F1-Score.
DOI:	10.1109/InC457730.2023.10263218