Extracting discriminative information from e-mail for spam detection inspired by Immune System

Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a slidi...

Full description

Saved in:
Bibliographic Details
Published inIEEE Congress on Evolutionary Computation pp. 1 - 7
Main Authors Yuanchun Zhu, Ying Tan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a sliding window. At each area, a two-dimensional feature is constructed by calculating the concentrations of spam and legitimate email. Then all the features of each area are combined together as a whole feature vector. Several experiments are conducted on four benchmark corpora, by using 10-fold cross-validation. It is shown that the LC approach can extract the effective position correlated information from messages. Compared to the prevalent Bag-of-Words approach, the LC has better performance in terms of both accuracy and F 1 measure. Most significantly, the LC approach can reduce feature dimensionality greatly and has much faster speed.
ISBN:1424469090
9781424469093
ISSN:1089-778X
1941-0026
DOI:10.1109/CEC.2010.5586290