GA-based feature subset selection in a spam/non-spam detection system
Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than...
Saved in:
Published in | 2012 International Conference on Computer and Communication Engineering pp. 675 - 679 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2012
|
Subjects | |
Online Access | Get full text |
ISBN | 1467304786 9781467304788 |
DOI | 10.1109/ICCCE.2012.6271302 |
Cover
Abstract | Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes. |
---|---|
AbstractList | Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes. |
Author | Mustapha, Aida Nezamabadi-pour, Hossein Sulaiman, M. N. Mustapha, N. Behjat, Amir Rajabi |
Author_xml | – sequence: 1 givenname: Amir Rajabi surname: Behjat fullname: Behjat, Amir Rajabi email: rajabi.amir6@gmail.com organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia – sequence: 2 givenname: Aida surname: Mustapha fullname: Mustapha, Aida email: aida@fsktm.upm.edu.my organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia – sequence: 3 givenname: Hossein surname: Nezamabadi-pour fullname: Nezamabadi-pour, Hossein email: nezam@mail.uk.ac.ir organization: Dept. of Electr. Eng., Shahid Bahonar Univ. of Kerman, Kerman, Iran – sequence: 4 givenname: M. N. surname: Sulaiman fullname: Sulaiman, M. N. email: nasir@fsktm.upm.edu.my organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia – sequence: 5 givenname: N. surname: Mustapha fullname: Mustapha, N. email: norwati@fsktm.upm.edu.my organization: Fac. of Comput. Sci. & Inf. Technol., Univ. Putra Malaysia, Serdang, Malaysia |
BookMark | eNpFT81KAzEYjKigrX0BveQFdvt9Sdwkx7KstVDw0nvJzxdY6W7LJj307a1YcC4zw8AMM2MP43Ekxl4RakSwy03btl0tAEXdCI0SxB2boWq0BKWtuv83pnlii5y_4QptEIx8Zt16VXmXKfJErpwn4vnsMxWe6UCh9MeR9yN3PJ_csLwuV7-CRyq3MF9yoeGFPSZ3yLS48ZztPrpd-1ltv9abdrWtegulktZ4xBjfRaNc8j6QSwqcCzL6RighlTXRhJjA-qSj9tKrAAItKkQdopyzt7_anoj2p6kf3HTZ317LH4_mTZs |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICCCE.2012.6271302 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1467304794 1467304778 9781467304771 9781467304795 |
EndPage | 679 |
ExternalDocumentID | 6271302 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL |
ID | FETCH-LOGICAL-i90t-398b11dd5264afbbceaf40aac3db62423498d8cdf09bf7d7b3b4c021914117cd3 |
IEDL.DBID | RIE |
ISBN | 1467304786 9781467304788 |
IngestDate | Wed Aug 27 04:39:34 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-398b11dd5264afbbceaf40aac3db62423498d8cdf09bf7d7b3b4c021914117cd3 |
PageCount | 5 |
ParticipantIDs | ieee_primary_6271302 |
PublicationCentury | 2000 |
PublicationDate | 2012-July |
PublicationDateYYYYMMDD | 2012-07-01 |
PublicationDate_xml | – month: 07 year: 2012 text: 2012-July |
PublicationDecade | 2010 |
PublicationTitle | 2012 International Conference on Computer and Communication Engineering |
PublicationTitleAbbrev | ICCCE |
PublicationYear | 2012 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000781083 |
Score | 1.5576694 |
Snippet | Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 675 |
SubjectTerms | Accuracy Electronic mail Feature extraction Feature selection Genetic algorithm Genetic algorithms MLP Spam detection Support vector machine classification Training |
Title | GA-based feature subset selection in a spam/non-spam detection system |
URI | https://ieeexplore.ieee.org/document/6271302 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkyAWsRbHhhxGtdOYo8oailIRQxF6lb5cZYqRIpouvDrsZ20CMTA5sRK4viiu8vdfd8hdMNGWuQuUyRlhSTcgNeDkDsSGHGN5RSyGIecPeXTF_64yBYddLvHwgBALD6DJAxjLt-uzTaEyob--pBn66Ku_8warNY-nhJIa7w7EbFbeRGSSSLfUTq1x2IHmknl8KEsy3Go7Bol7V1_tFeJ1mVyiGa7dTVFJa_JttaJ-fxF2fjfhR-hwTeODz_vLdQx6kDVR-P7OxJsl8UOIqsn3njlATXexJY4Xk54VWGFvap5G1brioQBtlC3kw338wDNJ-N5OSVtMwWykmlNmBSaUmsz7wApp7UB5XiqlGFWB4gI41KERkYuldoVttBMc-Ptv6ScUi81doJ6_pFwirCVmVTgFYMqFAchvYsgnP9PFIVRlBpzhvphB5bvDV3Gsn35879PX6CDIIWmAvYS9eqPLVx5O1_r6yjgL-xro2M |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKGWAC1CLeeGDEbdw4iT2iqNBCWzEUqVvlx1mqECmi6cKvx3bSIhADm53IiZ2T7i53932H0E3cUzy1iSRRnAnCNDg9CKklnhFXG0YhCXHI8SQdvLDHWTJroNstFgYAQvEZdPww5PLNUq99qKzr1vs82w7adXafJRVaaxtR8bQ1zqEI6K008-kknm5Ineo538BmItEd5nne97VdvU793B8NVoJ9uT9A483OqrKS1866VB39-Yu08b9bP0TtbyQfft7aqCPUgKKF-g93xFsvgy0EXk-8cuoDSrwKTXGcpPCiwBI7ZfPWLZYF8QNsoKxvVuzPbTS970_zAanbKZCFiEoSC64oNSZxLpC0SmmQlkVS6tgoDxKJmeC-lZGNhLKZyVSsmHYegKCMUie3-Bg13SvhBGEjEiHBqQaZSQZcOCeBW_enyDMtKdX6FLX8F5i_V4QZ8_rwZ39fvkZ7g-l4NB8NJ0_naN9LpKqHvUDN8mMNl87ql-oqCPsLsISmsA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+International+Conference+on+Computer+and+Communication+Engineering&rft.atitle=GA-based+feature+subset+selection+in+a+spam%2Fnon-spam+detection+system&rft.au=Behjat%2C+Amir+Rajabi&rft.au=Mustapha%2C+Aida&rft.au=Nezamabadi-pour%2C+Hossein&rft.au=Sulaiman%2C+M.+N.&rft.date=2012-07-01&rft.pub=IEEE&rft.isbn=9781467304788&rft.spage=675&rft.epage=679&rft_id=info:doi/10.1109%2FICCCE.2012.6271302&rft.externalDocID=6271302 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467304788/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467304788/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467304788/sc.gif&client=summon&freeimage=true |