Enhancing email classification using data reduction and disagreement-based semi-supervised learning

Email classification is an important topic in literature attempting to correctly classify user emails and filter out spam emails. In this paper, we identify some challenges regarding this topic and propose an effective email classification model based on both data reduction and disagreement-based se...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE International Conference on Communications (ICC) pp. 622 - 627
Main Authors	Yuxin Meng, Wenjuan Li, Lam-For Kwok
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2014
Subjects	Accuracy Data models Data Reduction Disagreement-based Semi-Supervised Learning Electronic mail Email Classification Machine Learning Semisupervised learning Support vector machines Training Vegetation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Email classification is an important topic in literature attempting to correctly classify user emails and filter out spam emails. In this paper, we identify some challenges regarding this topic and propose an effective email classification model based on both data reduction and disagreement-based semi-supervised learning. In particular, the main objective of the data reduction is to select an optimum collection of email features and reduce the pointless data, while the objective of the disagreement-based approach is to enhance the accuracy of detecting spam emails by utilizing unlabeled data automatically. In the evaluation, we explore the performance of our proposed email classification model using two public datasets and a private dataset. The experimental results demonstrate that our proposed model can overall enhance the performance of email classification through improving detection accuracy and reducing false rates.
ISSN:	1550-3607 1938-1883
DOI:	10.1109/ICC.2014.6883388