On the Validity of a New SMS Spam Collection
Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely...
Saved in:
Published in | 2012 Eleventh International Conference on Machine Learning and Applications Vol. 2; pp. 240 - 245 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2012
|
Subjects | |
Online Access | Get full text |
ISBN | 1467346519 9781467346511 |
DOI | 10.1109/ICMLA.2012.211 |
Cover
Loading…
Abstract | Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers. |
---|---|
AbstractList | Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers. |
Author | Almeida, T. A. Hidalgo, J. M. G. Yamakami, A. |
Author_xml | – sequence: 1 givenname: J. M. G. surname: Hidalgo fullname: Hidalgo, J. M. G. email: jgomez@optenet.com organization: R&D Dept., Optenet, Madrid, Spain – sequence: 2 givenname: T. A. surname: Almeida fullname: Almeida, T. A. email: talmeida@ufscar.br organization: Dept. of Comput. Sci., Fed. Univ. of Sao Carlos - UFSCar, Sorocaba, Brazil – sequence: 3 givenname: A. surname: Yamakami fullname: Yamakami, A. email: akebo@dt.fee.unicamp.br organization: Sch. of Electr. & Comput. Eng., Univ. of Campinas - UNICAMP, Sao Paulo, Brazil |
BookMark | eNotzL1OwzAUQGEjQIKWrCwsfgAS7rWv7XisIn4qpXQIsFaOeyOM0qRqI6G-PUgwnek7M3ExjAMLcYtQIIJ_WFarelEoQFUoxDOReVeCs96QR63OxQzJOk3WoL8S2fH4BQC_0Gqia3G_HuT0yfIj9GmbppMcOxnkK3_LZtXIZh92shr7nuOUxuFGXHahP3L237l4f3p8q17yev28rBZ1ntCZKe98GckzU8nbwFpTxJZV0Dq2AWLrQAXjIZaWyKkIHSv2wZbGtRpsVKTn4u7vm5h5sz-kXTicNpbAOuP0D4gwQuQ |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICMLA.2012.211 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9780769549132 0769549136 |
EndPage | 245 |
ExternalDocumentID | 6406757 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL |
ID | FETCH-LOGICAL-i175t-f98c49ee48edae334c1be2a33cba0cb702a590c864472c0fe2e9a6857b306c243 |
IEDL.DBID | RIE |
ISBN | 1467346519 9781467346511 |
IngestDate | Wed Aug 27 03:56:20 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-f98c49ee48edae334c1be2a33cba0cb702a590c864472c0fe2e9a6857b306c243 |
PageCount | 6 |
ParticipantIDs | ieee_primary_6406757 |
PublicationCentury | 2000 |
PublicationDate | 2012-Dec. |
PublicationDateYYYYMMDD | 2012-12-01 |
PublicationDate_xml | – month: 12 year: 2012 text: 2012-Dec. |
PublicationDecade | 2010 |
PublicationTitle | 2012 Eleventh International Conference on Machine Learning and Applications |
PublicationTitleAbbrev | icmla |
PublicationYear | 2012 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001106344 |
Score | 1.670844 |
Snippet | Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 240 |
SubjectTerms | Artificial neural networks Cellular phones Classification Mobile communication Mobile handsets Mobile spam Spam filtering Text analysis Text categorization Unsolicited electronic mail |
Title | On the Validity of a New SMS Spam Collection |
URI | https://ieeexplore.ieee.org/document/6406757 |
Volume | 2 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhibNLGdD4-ooiqIAFIp6lY557OEEG2F0oVfj-2kLUIMbEmGyE5i37vLvfcIuY6MRvsdZIHk0gQiNjpQoFWgDQOBvDSR19IrHtPxVNzPklmL9LdcGET0zWcYukP_L18vYe1KZYNUOHyb7ZE9e_uaq7Wrp9jchgvhuVtpxp3Ft9xIOjXncSPaGEdycDcsHm5cZxcLmXMP-mGt4iPLqEOKzZjqhpL3cF2VIXz9kmv876APSG_H4aPP2-h0SFq4OCKdjYkDbdZ0l_SfFtSCQPpqAbm2kJwuDVXU7n10UkzoZKU-qC8ueP5Dj0xHty_DcdBYKARvFhdUgZE5CIkoctQKORcQl8gU51CqCMosYiqREeQWFWUMIoMMpUrzJCttKgFM8GPSXiwXeEJolkKaSwcQAIRiNtFJVa5EDGhMopP4lHTd5OerWiVj3sz77O_L52TfPfy6MeSCtKvPNV7a8F6VV_69fgPerZ-O |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_DIYGu7Hz0aIgFlaAIYbqRrXxNjHMSMi3-9bTfAGA_etp3atN373uv7vg-hO18rMPsg9jjl2mOBVp6QSnhKE8mAZtp3WnrpOBrM2OM8nNdQe8uFAQDXfAYd--ju8tVSrm2prBsxi2_jPbRv4j4LS7bWrqJishvKmGNvRTG1Jt98I-pUvQeVbGPg8-6wl47ubW8X6RDrH_TDXMXFln4DpZtRlS0l7511kXXk1y_Bxv8O-wi1diw-_LKNT8eoBvkJamxsHHB1qpuo_ZxjAwPxq4HkyoByvNRYYPP3w5N0gicr8YFdecExIFpo1n-Y9gZeZaLgvRlkUHiaJ5JxAJaAEkApk0EGRFAqM-HLLPaJCLkvE4OLYiJ9DQS4iJIwzkwyIQmjp6ieL3M4QziOZJRwCxGkZIKYVCcSiWCBBK1DFQbnqGknv1iVOhmLat4Xf3--RQeDaTpajIbjp0t0aBeibBO5QvXicw3XJtgX2Y1b429z56Lb |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+Eleventh+International+Conference+on+Machine+Learning+and+Applications&rft.atitle=On+the+Validity+of+a+New+SMS+Spam+Collection&rft.au=Hidalgo%2C+J.+M.+G.&rft.au=Almeida%2C+T.+A.&rft.au=Yamakami%2C+A.&rft.date=2012-12-01&rft.pub=IEEE&rft.isbn=9781467346511&rft.volume=2&rft.spage=240&rft.epage=245&rft_id=info:doi/10.1109%2FICMLA.2012.211&rft.externalDocID=6406757 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467346511/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467346511/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467346511/sc.gif&client=summon&freeimage=true |