On the Validity of a New SMS Spam Collection

Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely...

Full description

Saved in:
Bibliographic Details
Published in2012 Eleventh International Conference on Machine Learning and Applications Vol. 2; pp. 240 - 245
Main Authors Hidalgo, J. M. G., Almeida, T. A., Yamakami, A.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2012
Subjects
Online AccessGet full text
ISBN1467346519
9781467346511
DOI10.1109/ICMLA.2012.211

Cover

Loading…
Abstract Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers.
AbstractList Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically increasing year by year. Probably, one of the major concerns in academic settings was the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. To address this issue, we have recently proposed a new SMS Spam Collection that, to the best of our knowledge, is the largest, public and real SMS dataset available for academic studies. However, as it has been created by augmenting a previously existing database built using roughly the same sources, it is sensible to certify that there are no duplicates coming from them. So, in this paper we offer a comprehensive analysis of the new SMS Spam Collection in order to ensure that this does not happen, since it may ease the task of learning SMS spam classifiers and, hence, it could compromise the evaluation of methods. The analysis of results indicate that the procedure followed does not lead to near-duplicates and, consequently, the proposed dataset is reliable to use for evaluating and comparing the performance achieved by different classifiers.
Author Almeida, T. A.
Hidalgo, J. M. G.
Yamakami, A.
Author_xml – sequence: 1
  givenname: J. M. G.
  surname: Hidalgo
  fullname: Hidalgo, J. M. G.
  email: jgomez@optenet.com
  organization: R&D Dept., Optenet, Madrid, Spain
– sequence: 2
  givenname: T. A.
  surname: Almeida
  fullname: Almeida, T. A.
  email: talmeida@ufscar.br
  organization: Dept. of Comput. Sci., Fed. Univ. of Sao Carlos - UFSCar, Sorocaba, Brazil
– sequence: 3
  givenname: A.
  surname: Yamakami
  fullname: Yamakami, A.
  email: akebo@dt.fee.unicamp.br
  organization: Sch. of Electr. & Comput. Eng., Univ. of Campinas - UNICAMP, Sao Paulo, Brazil
BookMark eNotzL1OwzAUQGEjQIKWrCwsfgAS7rWv7XisIn4qpXQIsFaOeyOM0qRqI6G-PUgwnek7M3ExjAMLcYtQIIJ_WFarelEoQFUoxDOReVeCs96QR63OxQzJOk3WoL8S2fH4BQC_0Gqia3G_HuT0yfIj9GmbppMcOxnkK3_LZtXIZh92shr7nuOUxuFGXHahP3L237l4f3p8q17yev28rBZ1ntCZKe98GckzU8nbwFpTxJZV0Dq2AWLrQAXjIZaWyKkIHSv2wZbGtRpsVKTn4u7vm5h5sz-kXTicNpbAOuP0D4gwQuQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICMLA.2012.211
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9780769549132
0769549136
EndPage 245
ExternalDocumentID 6406757
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i175t-f98c49ee48edae334c1be2a33cba0cb702a590c864472c0fe2e9a6857b306c243
IEDL.DBID RIE
ISBN 1467346519
9781467346511
IngestDate Wed Aug 27 03:56:20 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-f98c49ee48edae334c1be2a33cba0cb702a590c864472c0fe2e9a6857b306c243
PageCount 6
ParticipantIDs ieee_primary_6406757
PublicationCentury 2000
PublicationDate 2012-Dec.
PublicationDateYYYYMMDD 2012-12-01
PublicationDate_xml – month: 12
  year: 2012
  text: 2012-Dec.
PublicationDecade 2010
PublicationTitle 2012 Eleventh International Conference on Machine Learning and Applications
PublicationTitleAbbrev icmla
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001106344
Score 1.670844
Snippet Mobile phones are becoming the latest target of electronic junk mail. Recent reports clearly indicate that the volume of SMS spam messages are dramatically...
SourceID ieee
SourceType Publisher
StartPage 240
SubjectTerms Artificial neural networks
Cellular phones
Classification
Mobile communication
Mobile handsets
Mobile spam
Spam filtering
Text analysis
Text categorization
Unsolicited electronic mail
Title On the Validity of a New SMS Spam Collection
URI https://ieeexplore.ieee.org/document/6406757
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhibNLGdD4-ooiqIAFIp6lY557OEEG2F0oVfj-2kLUIMbEmGyE5i37vLvfcIuY6MRvsdZIHk0gQiNjpQoFWgDQOBvDSR19IrHtPxVNzPklmL9LdcGET0zWcYukP_L18vYe1KZYNUOHyb7ZE9e_uaq7Wrp9jchgvhuVtpxp3Ft9xIOjXncSPaGEdycDcsHm5cZxcLmXMP-mGt4iPLqEOKzZjqhpL3cF2VIXz9kmv876APSG_H4aPP2-h0SFq4OCKdjYkDbdZ0l_SfFtSCQPpqAbm2kJwuDVXU7n10UkzoZKU-qC8ueP5Dj0xHty_DcdBYKARvFhdUgZE5CIkoctQKORcQl8gU51CqCMosYiqREeQWFWUMIoMMpUrzJCttKgFM8GPSXiwXeEJolkKaSwcQAIRiNtFJVa5EDGhMopP4lHTd5OerWiVj3sz77O_L52TfPfy6MeSCtKvPNV7a8F6VV_69fgPerZ-O
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4QD3pCBeNve_DIYGu7Hz0aIgFlaAIYbqRrXxNjHMSMi3-9bTfAGA_etp3atN373uv7vg-hO18rMPsg9jjl2mOBVp6QSnhKE8mAZtp3WnrpOBrM2OM8nNdQe8uFAQDXfAYd--ju8tVSrm2prBsxi2_jPbRv4j4LS7bWrqJishvKmGNvRTG1Jt98I-pUvQeVbGPg8-6wl47ubW8X6RDrH_TDXMXFln4DpZtRlS0l7511kXXk1y_Bxv8O-wi1diw-_LKNT8eoBvkJamxsHHB1qpuo_ZxjAwPxq4HkyoByvNRYYPP3w5N0gicr8YFdecExIFpo1n-Y9gZeZaLgvRlkUHiaJ5JxAJaAEkApk0EGRFAqM-HLLPaJCLkvE4OLYiJ9DQS4iJIwzkwyIQmjp6ieL3M4QziOZJRwCxGkZIKYVCcSiWCBBK1DFQbnqGknv1iVOhmLat4Xf3--RQeDaTpajIbjp0t0aBeibBO5QvXicw3XJtgX2Y1b429z56Lb
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+Eleventh+International+Conference+on+Machine+Learning+and+Applications&rft.atitle=On+the+Validity+of+a+New+SMS+Spam+Collection&rft.au=Hidalgo%2C+J.+M.+G.&rft.au=Almeida%2C+T.+A.&rft.au=Yamakami%2C+A.&rft.date=2012-12-01&rft.pub=IEEE&rft.isbn=9781467346511&rft.volume=2&rft.spage=240&rft.epage=245&rft_id=info:doi/10.1109%2FICMLA.2012.211&rft.externalDocID=6406757
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467346511/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467346511/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467346511/sc.gif&client=summon&freeimage=true