A Safe Approach to Shrink Email Sample Set while Keeping Balance between Spam and Normal

To deal with any possible cases for training anti-spam machine learning models, it is crucial to design a safe way to shrink the size of training sample set via reducing redundancies with minimal information loss for classification as well as make distribution of samples balanced. Presently, there i...

Full description

Saved in:

Bibliographic Details
Published in	2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement pp. 329 - 334
Main Authors	Lili Diao, Hao Wang
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2009
Subjects	anti-spam Conferences Engines Industry applications Machine learning Software safety Software testing Software tools Support vector machine classification Support vector machines SVM Unsolicited electronic mail
Online Access	Get full text

Cover

Loading…

More Information
Summary:	To deal with any possible cases for training anti-spam machine learning models, it is crucial to design a safe way to shrink the size of training sample set via reducing redundancies with minimal information loss for classification as well as make distribution of samples balanced. Presently, there is no such solution to do so. In this paper, we propose a safe approach to address these problems and improve the quality of training email sample pool (set) for getting high quality machine learning models for better anti-spam engine with non-biased high spam detection rates as well as low false positive rates.
ISBN:	0769537588 9780769537580
DOI:	10.1109/SSIRI.2009.66