A Spam Filter approach with the Improved Machine Learning Technology

This paper presents an improved machine learning technology to automatically filter out spam. Two kinds of work are done by us. Firstly, in order to overcome the sparse data problem, which is almost suffered from by all the corpus based supervised learning classifiers, we propose to incorporate the...

Full description

Saved in:
Bibliographic Details
Published inThird International Conference on Natural Computation (ICNC 2007) Vol. 2; pp. 484 - 488
Main Authors Xiu-Li Pang, Yu-Qiang Feng, Wei Jiang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents an improved machine learning technology to automatically filter out spam. Two kinds of work are done by us. Firstly, in order to overcome the sparse data problem, which is almost suffered from by all the corpus based supervised learning classifiers, we propose to incorporate the smoothing algorithm into the Naive Bayes (NB) model. Secondly, support vector machine (SVM) is good at the classification task with the high dimension feature space, and has the good generalization in small-scale samples. A high-efficient SVM is presented to reduce the storage requirement and decrease the time computing complexity. Thirdly, a specialty word extraction algorithm based on information entropy is presented in this paper, which can improve the performance of Chinese word segmentation, accordingly, improve the spam filter task in English and in Chinese. Experiments show that the improvement in NB acquires by 1.38% precision, and it is 0.69% higher precision than the SVM.
ISBN:9780769528755
0769528759
ISSN:2157-9555
DOI:10.1109/ICNC.2007.143