Extracting discriminative information from e-mail for spam detection inspired by Immune System

Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a slidi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Congress on Evolutionary Computation pp. 1 - 7
Main Authors	Yuanchun Zhu, Ying Tan
Format	Conference Proceeding
Language	English
Published	IEEE 01.07.2010
Subjects	Accuracy Construction industry Electronic mail Feature extraction Libraries Pathogens Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a sliding window. At each area, a two-dimensional feature is constructed by calculating the concentrations of spam and legitimate email. Then all the features of each area are combined together as a whole feature vector. Several experiments are conducted on four benchmark corpora, by using 10-fold cross-validation. It is shown that the LC approach can extract the effective position correlated information from messages. Compared to the prevalent Bag-of-Words approach, the LC has better performance in terms of both accuracy and F 1 measure. Most significantly, the LC approach can reduce feature dimensionality greatly and has much faster speed.
ISBN:	1424469090 9781424469093
ISSN:	1089-778X 1941-0026
DOI:	10.1109/CEC.2010.5586290