Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM

The use of social networks has become an integral part of our daily lives. Even though social networking sites offer many advantages, they also pose a number of problems for their users. One of the most famous problems is unwanted messages. It is not desirable for social network users to be bothered...

Full description

Saved in:

Bibliographic Details
Published in	SN computer science Vol. 4; no. 4; p. 380
Main Authors	Ghanem, Razan, Erbay, Hasan, Bakour, Khaled
Format	Journal Article
Language	English
Published	Singapore Springer Nature Singapore 01.07.2023 Springer Nature B.V
Subjects	Algorithms Blacklisting Comparative studies Computer Imaging Computer Science Computer Systems Organization and Communication Networks Data Structures and Information Theory Datasets Deep learning Information Systems and Communication Service Language Messages Methods Natural language processing Neural networks Original Research Pattern Recognition and Graphics Recurrent neural networks Representations Social networks Software Engineering/Programming and Operating Systems Sparsity Support vector machines Vision Stacked BLSTM Word embedding BERT Transformer RoBERTa Spam detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The use of social networks has become an integral part of our daily lives. Even though social networking sites offer many advantages, they also pose a number of problems for their users. One of the most famous problems is unwanted messages. It is not desirable for social network users to be bothered by annoying and time-wasting messages. These unwanted messages, which include ads, malicious content, and any low-quality content, are called spam. It is challenging to combat spam on social networks, because messages exchanged through social media are short, sparse, and may contain grammatical and spelling errors in addition to complex characters and special patterns. The main task to solve such a problem depends essentially on an appropriate representation of the text to increase the efficiency of the classifier. Therefore, in this study, we introduce a RoBERTa-based bi-directional Recurrent Neural Network model for spam detection on social networks. The RoBERTa model is used to learn contextualized word representations to improve the performance of the stacked BLSTM network. Moreover, a comparative study, in which we apply the most common transformer-based models, has been conducted as well to solve the spam problem. The experimental results on three benchmark data set state that our RoBERTa–BLSTM model outperforms all common models used to detect spam on social networks with an accuracy of 98.15%, 94.41%, and 99.74% on Twitter, YouTube, and SMS data sets, respectively.
ISSN:	2661-8907 2662-995X 2661-8907
DOI:	10.1007/s42979-023-01798-x