Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM
The use of social networks has become an integral part of our daily lives. Even though social networking sites offer many advantages, they also pose a number of problems for their users. One of the most famous problems is unwanted messages. It is not desirable for social network users to be bothered...
Saved in:
Published in | SN computer science Vol. 4; no. 4; p. 380 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Singapore
Springer Nature Singapore
01.07.2023
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The use of social networks has become an integral part of our daily lives. Even though social networking sites offer many advantages, they also pose a number of problems for their users. One of the most famous problems is unwanted messages. It is not desirable for social network users to be bothered by annoying and time-wasting messages. These unwanted messages, which include ads, malicious content, and any low-quality content, are called spam. It is challenging to combat spam on social networks, because messages exchanged through social media are short, sparse, and may contain grammatical and spelling errors in addition to complex characters and special patterns. The main task to solve such a problem depends essentially on an appropriate representation of the text to increase the efficiency of the classifier. Therefore, in this study, we introduce a RoBERTa-based bi-directional Recurrent Neural Network model for spam detection on social networks. The RoBERTa model is used to learn contextualized word representations to improve the performance of the stacked BLSTM network. Moreover, a comparative study, in which we apply the most common transformer-based models, has been conducted as well to solve the spam problem. The experimental results on three benchmark data set state that our RoBERTa–BLSTM model outperforms all common models used to detect spam on social networks with an accuracy of 98.15%, 94.41%, and 99.74% on Twitter, YouTube, and SMS data sets, respectively. |
---|---|
ISSN: | 2661-8907 2662-995X 2661-8907 |
DOI: | 10.1007/s42979-023-01798-x |