A Spam Filter approach with the Improved Machine Learning Technology
This paper presents an improved machine learning technology to automatically filter out spam. Two kinds of work are done by us. Firstly, in order to overcome the sparse data problem, which is almost suffered from by all the corpus based supervised learning classifiers, we propose to incorporate the...
Saved in:
Published in | Third International Conference on Natural Computation (ICNC 2007) Vol. 2; pp. 484 - 488 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.08.2007
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents an improved machine learning technology to automatically filter out spam. Two kinds of work are done by us. Firstly, in order to overcome the sparse data problem, which is almost suffered from by all the corpus based supervised learning classifiers, we propose to incorporate the smoothing algorithm into the Naive Bayes (NB) model. Secondly, support vector machine (SVM) is good at the classification task with the high dimension feature space, and has the good generalization in small-scale samples. A high-efficient SVM is presented to reduce the storage requirement and decrease the time computing complexity. Thirdly, a specialty word extraction algorithm based on information entropy is presented in this paper, which can improve the performance of Chinese word segmentation, accordingly, improve the spam filter task in English and in Chinese. Experiments show that the improvement in NB acquires by 1.38% precision, and it is 0.69% higher precision than the SVM. |
---|---|
ISBN: | 9780769528755 0769528759 |
ISSN: | 2157-9555 |
DOI: | 10.1109/ICNC.2007.143 |