Extracting discriminative information from e-mail for spam detection inspired by Immune System
Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a slidi...
Saved in:
Published in | IEEE Congress on Evolutionary Computation pp. 1 - 7 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2010
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a sliding window. At each area, a two-dimensional feature is constructed by calculating the concentrations of spam and legitimate email. Then all the features of each area are combined together as a whole feature vector. Several experiments are conducted on four benchmark corpora, by using 10-fold cross-validation. It is shown that the LC approach can extract the effective position correlated information from messages. Compared to the prevalent Bag-of-Words approach, the LC has better performance in terms of both accuracy and F 1 measure. Most significantly, the LC approach can reduce feature dimensionality greatly and has much faster speed. |
---|---|
ISBN: | 1424469090 9781424469093 |
ISSN: | 1089-778X 1941-0026 |
DOI: | 10.1109/CEC.2010.5586290 |