GA-based feature subset selection in a spam/non-spam detection system

Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than...

Full description

Saved in:
Bibliographic Details
Published in2012 International Conference on Computer and Communication Engineering pp. 675 - 679
Main Authors Behjat, Amir Rajabi, Mustapha, Aida, Nezamabadi-pour, Hossein, Sulaiman, M. N., Mustapha, N.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2012
Subjects
Online AccessGet full text
ISBN1467304786
9781467304788
DOI10.1109/ICCCE.2012.6271302

Cover

Abstract Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes.
AbstractList Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes.
Author Mustapha, Aida
Nezamabadi-pour, Hossein
Sulaiman, M. N.
Mustapha, N.
Behjat, Amir Rajabi
Author_xml – sequence: 1
  givenname: Amir Rajabi
  surname: Behjat
  fullname: Behjat, Amir Rajabi
  email: rajabi.amir6@gmail.com
  organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia
– sequence: 2
  givenname: Aida
  surname: Mustapha
  fullname: Mustapha, Aida
  email: aida@fsktm.upm.edu.my
  organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia
– sequence: 3
  givenname: Hossein
  surname: Nezamabadi-pour
  fullname: Nezamabadi-pour, Hossein
  email: nezam@mail.uk.ac.ir
  organization: Dept. of Electr. Eng., Shahid Bahonar Univ. of Kerman, Kerman, Iran
– sequence: 4
  givenname: M. N.
  surname: Sulaiman
  fullname: Sulaiman, M. N.
  email: nasir@fsktm.upm.edu.my
  organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia
– sequence: 5
  givenname: N.
  surname: Mustapha
  fullname: Mustapha, N.
  email: norwati@fsktm.upm.edu.my
  organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia
BookMark eNpFT81KAzEYjKigrX0BveQFdvt9Sdwkx7KstVDw0nvJzxdY6W7LJj307a1YcC4zw8AMM2MP43Ekxl4RakSwy03btl0tAEXdCI0SxB2boWq0BKWtuv83pnlii5y_4QptEIx8Zt16VXmXKfJErpwn4vnsMxWe6UCh9MeR9yN3PJ_csLwuV7-CRyq3MF9yoeGFPSZ3yLS48ZztPrpd-1ltv9abdrWtegulktZ4xBjfRaNc8j6QSwqcCzL6RighlTXRhJjA-qSj9tKrAAItKkQdopyzt7_anoj2p6kf3HTZ317LH4_mTZs
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCCE.2012.6271302
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1467304794
1467304778
9781467304771
9781467304795
EndPage 679
ExternalDocumentID 6271302
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-398b11dd5264afbbceaf40aac3db62423498d8cdf09bf7d7b3b4c021914117cd3
IEDL.DBID RIE
ISBN 1467304786
9781467304788
IngestDate Wed Aug 27 04:39:34 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-398b11dd5264afbbceaf40aac3db62423498d8cdf09bf7d7b3b4c021914117cd3
PageCount 5
ParticipantIDs ieee_primary_6271302
PublicationCentury 2000
PublicationDate 2012-July
PublicationDateYYYYMMDD 2012-07-01
PublicationDate_xml – month: 07
  year: 2012
  text: 2012-July
PublicationDecade 2010
PublicationTitle 2012 International Conference on Computer and Communication Engineering
PublicationTitleAbbrev ICCCE
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000781083
Score 1.5576694
Snippet Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be...
SourceID ieee
SourceType Publisher
StartPage 675
SubjectTerms Accuracy
Electronic mail
Feature extraction
Feature selection
Genetic algorithm
Genetic algorithms
MLP
Spam detection
Support vector machine classification
Training
Title GA-based feature subset selection in a spam/non-spam detection system
URI https://ieeexplore.ieee.org/document/6271302
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkyAWsRbHhhxGtdOYo8oailIRQxF6lb5cZYqRIpouvDrsZ20CMTA5sRK4viiu8vdfd8hdMNGWuQuUyRlhSTcgNeDkDsSGHGN5RSyGIecPeXTF_64yBYddLvHwgBALD6DJAxjLt-uzTaEyob--pBn66Ku_8warNY-nhJIa7w7EbFbeRGSSSLfUTq1x2IHmknl8KEsy3Go7Bol7V1_tFeJ1mVyiGa7dTVFJa_JttaJ-fxF2fjfhR-hwTeODz_vLdQx6kDVR-P7OxJsl8UOIqsn3njlATXexJY4Xk54VWGFvap5G1brioQBtlC3kw338wDNJ-N5OSVtMwWykmlNmBSaUmsz7wApp7UB5XiqlGFWB4gI41KERkYuldoVttBMc-Ptv6ScUi81doJ6_pFwirCVmVTgFYMqFAchvYsgnP9PFIVRlBpzhvphB5bvDV3Gsn35879PX6CDIIWmAvYS9eqPLVx5O1_r6yjgL-xro2M
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKGWAC1CLeeGDEbdw4iT2iqNBCWzEUqVvlx1mqECmi6cKvx3bSIhADm53IiZ2T7i53932H0E3cUzy1iSRRnAnCNDg9CKklnhFXG0YhCXHI8SQdvLDHWTJroNstFgYAQvEZdPww5PLNUq99qKzr1vs82w7adXafJRVaaxtR8bQ1zqEI6K008-kknm5Ineo538BmItEd5nne97VdvU793B8NVoJ9uT9A483OqrKS1866VB39-Yu08b9bP0TtbyQfft7aqCPUgKKF-g93xFsvgy0EXk-8cuoDSrwKTXGcpPCiwBI7ZfPWLZYF8QNsoKxvVuzPbTS970_zAanbKZCFiEoSC64oNSZxLpC0SmmQlkVS6tgoDxKJmeC-lZGNhLKZyVSsmHYegKCMUie3-Bg13SvhBGEjEiHBqQaZSQZcOCeBW_enyDMtKdX6FLX8F5i_V4QZ8_rwZ39fvkZ7g-l4NB8NJ0_naN9LpKqHvUDN8mMNl87ql-oqCPsLsISmsA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+International+Conference+on+Computer+and+Communication+Engineering&rft.atitle=GA-based+feature+subset+selection+in+a+spam%2Fnon-spam+detection+system&rft.au=Behjat%2C+Amir+Rajabi&rft.au=Mustapha%2C+Aida&rft.au=Nezamabadi-pour%2C+Hossein&rft.au=Sulaiman%2C+M.+N.&rft.date=2012-07-01&rft.pub=IEEE&rft.isbn=9781467304788&rft.spage=675&rft.epage=679&rft_id=info:doi/10.1109%2FICCCE.2012.6271302&rft.externalDocID=6271302
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467304788/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467304788/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467304788/sc.gif&client=summon&freeimage=true