Behavior Analysis Based SMS Spammer Detection in Mobile Communication Networks

In a communication network, automatic short message service (SMS) spammer detection is a big challenge for a telecommunication operator nowadays, especially with the development of the rich communication services (RCS). Three main problems exist in the areas of research and real practice. They are (...

Full description

Saved in:
Bibliographic Details
Published in2016 IEEE First International Conference on Data Science in Cyberspace (DSC) pp. 538 - 543
Main Authors Zhang Bin, Zhao Gang, Feng Yunbo, Zhang Xiaolu, Jiang Weiqiang, Dai Jing, Gao Jiafeng
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In a communication network, automatic short message service (SMS) spammer detection is a big challenge for a telecommunication operator nowadays, especially with the development of the rich communication services (RCS). Three main problems exist in the areas of research and real practice. They are (1) the whole-volume content based SMS spam detection techniques cannot be easily used on the side of network due to the issue of user privacy, (2) traditional ways to filter the spam according to the combination of key words and sending frequency can be easily bypassed by adding the interference words, (3) Most of them result in a great deal of manual review after the automatic filtering due to a low precision rate. To make up the aforementioned gaps, we study the user behavior characteristics. A two-dimensional visualized result indicates that any combination of two user behavior attributes cannot distinguish the abnormal users from the whole set by splitting the 2-dimensional space. Thus, the integration of multiple user behavior attributes is exploited to train the classifier in a labeled set by machine learning algorithms, respectively, including decision tree, random forest, supported vector machine (SVM), logistic regression, and self-organized feature mapping (SOM). The performance comparison indicates that random forest is a good choice to balance the tradeoff of the precision rate and the recall rate, and in an acceptable time. The experimental result shows the proposed method without the knowledge of SMS content has a significant improvement in terms of precision rate and recall rate compared with the traditional method using the combination of key words and sending frequency used in most of existing networks.
DOI:10.1109/DSC.2016.48