Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM

The use of social networks has become an integral part of our daily lives. Even though social networking sites offer many advantages, they also pose a number of problems for their users. One of the most famous problems is unwanted messages. It is not desirable for social network users to be bothered...

Full description

Saved in:
Bibliographic Details
Published inSN computer science Vol. 4; no. 4; p. 380
Main Authors Ghanem, Razan, Erbay, Hasan, Bakour, Khaled
Format Journal Article
LanguageEnglish
Published Singapore Springer Nature Singapore 01.07.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The use of social networks has become an integral part of our daily lives. Even though social networking sites offer many advantages, they also pose a number of problems for their users. One of the most famous problems is unwanted messages. It is not desirable for social network users to be bothered by annoying and time-wasting messages. These unwanted messages, which include ads, malicious content, and any low-quality content, are called spam. It is challenging to combat spam on social networks, because messages exchanged through social media are short, sparse, and may contain grammatical and spelling errors in addition to complex characters and special patterns. The main task to solve such a problem depends essentially on an appropriate representation of the text to increase the efficiency of the classifier. Therefore, in this study, we introduce a RoBERTa-based bi-directional Recurrent Neural Network model for spam detection on social networks. The RoBERTa model is used to learn contextualized word representations to improve the performance of the stacked BLSTM network. Moreover, a comparative study, in which we apply the most common transformer-based models, has been conducted as well to solve the spam problem. The experimental results on three benchmark data set state that our RoBERTa–BLSTM model outperforms all common models used to detect spam on social networks with an accuracy of 98.15%, 94.41%, and 99.74% on Twitter, YouTube, and SMS data sets, respectively.
ISSN:2661-8907
2662-995X
2661-8907
DOI:10.1007/s42979-023-01798-x