Ensemble Clustering for Internet Security Applications
Due to their damage to Internet security, malware and phishing website detection has been the Internet security topics that are of great interests. Compared with malware attacks, phishing website fraud is a relatively new Internet crime. However, they share some common properties: 1) both malware sa...
Saved in:
Published in | IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews Vol. 42; no. 6; pp. 1784 - 1796 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New-York, NY
IEEE
01.11.2012
Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Due to their damage to Internet security, malware and phishing website detection has been the Internet security topics that are of great interests. Compared with malware attacks, phishing website fraud is a relatively new Internet crime. However, they share some common properties: 1) both malware samples and phishing websites are created at a rate of thousands per day driven by economic benefits; and 2) phishing websites represented by the term frequencies of the webpage content share similar characteristics with malware samples represented by the instruction frequencies of the program. Over the past few years, many clustering techniques have been employed for automatic malware and phishing website detection. In these techniques, the detection process is generally divided into two steps: 1) feature extraction, where representative features are extracted to capture the characteristics of the file samples or the websites; and 2) categorization, where intelligent techniques are used to automatically group the file samples or websites into different classes based on computational analysis of the feature representations. However, few have been applied in real industry products. In this paper, we develop an automatic categorization system to automatically group phishing websites or malware samples using a cluster ensemble by aggregating the clustering solutions that are generated by different base clustering algorithms. We propose a principled cluster ensemble framework to combine individual clustering solutions that are based on the consensus partition, which can not only be applied for malware categorization, but also for phishing website clustering. In addition, the domain knowledge in the form of sample-level/website-level constraints can be naturally incorporated into the ensemble framework. The case studies on large and real daily phishing websites and malware collection from the Kingsoft Internet Security Laboratory demonstrate the effectiveness and efficiency of our proposed method. |
---|---|
AbstractList | Due to their damage to Internet security, malware and phishing website detection has been the Internet security topics that are of great interests. Compared with malware attacks, phishing website fraud is a relatively new Internet crime. However, they share some common properties: 1) both malware samples and phishing websites are created at a rate of thousands per day driven by economic benefits; and 2) phishing websites represented by the term frequencies of the webpage content share similar characteristics with malware samples represented by the instruction frequencies of the program. Over the past few years, many clustering techniques have been employed for automatic malware and phishing website detection. In these techniques, the detection process is generally divided into two steps: 1) feature extraction, where representative features are extracted to capture the characteristics of the file samples or the websites; and 2) categorization, where intelligent techniques are used to automatically group the file samples or websites into different classes based on computational analysis of the feature representations. However, few have been applied in real industry products. In this paper, we develop an automatic categorization system to automatically group phishing websites or malware samples using a cluster ensemble by aggregating the clustering solutions that are generated by different base clustering algorithms. We propose a principled cluster ensemble framework to combine individual clustering solutions that are based on the consensus partition, which can not only be applied for malware categorization, but also for phishing website clustering. In addition, the domain knowledge in the form of sample-level/website-level constraints can be naturally incorporated into the ensemble framework. The case studies on large and real daily phishing websites and malware collection from the Kingsoft Internet Security Laboratory demonstrate the effectiveness and efficiency of our proposed method. |
Author | Tao Li Yanfang Ye Weiwei Zhuang Yong Chen |
Author_xml | – sequence: 1 givenname: Weiwei surname: Zhuang fullname: Zhuang, Weiwei – sequence: 2 givenname: Yanfang surname: Ye fullname: Ye, Yanfang – sequence: 3 givenname: Yong surname: Chen fullname: Chen, Yong – sequence: 4 givenname: Tao surname: Li fullname: Li, Tao |
BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=26818941$$DView record in Pascal Francis |
BookMark | eNp9kD1PwzAURS1UJNrCH4AlCxJLir_i2GMVFahUxNAyR67zjIxSJ9jJ0H9P-qEODHi5ftK570lngka-8YDQPcEzQrB63qzfi2JGMaEzOjxMsys0JlkmU8o5HQ1_rHgqVJ7foEmM3xgTzhUbI7HwEXbbGpKi7mMHwfmvxDYhWfph8NAlazB9cN0-mbdt7YzuXOPjLbq2uo5wd84p-nxZbIq3dPXxuizmq9SwTHbpVkhDLCMV5XRInAkmwYLIgFmlBbY2k5UCinNZbe2QXIOVOOdEE6CVYVP0dNrbhuanh9iVOxcN1LX20PSxJIxkQkmO5YA-nlEdja5t0N64WLbB7XTYl1RIIhUnA0dPnAlNjAHsBSG4PMgsjzLLg8zyLHMoyT8l47qjii5oV_9ffThVHQBcbgmmKBeE_QJjFYOH |
CODEN | ITCRFH |
CitedBy_id | crossref_primary_10_1016_j_knosys_2016_10_003 crossref_primary_10_1016_j_asoc_2016_01_043 crossref_primary_10_1016_j_cose_2023_103561 crossref_primary_10_1016_j_ijleo_2015_10_078 crossref_primary_10_1109_TCC_2015_2481378 crossref_primary_10_1109_TKDE_2015_2499200 crossref_primary_10_1109_TKDE_2023_3292573 crossref_primary_10_1109_TNSM_2022_3162885 crossref_primary_10_1093_jigpal_jzw047 crossref_primary_10_1016_j_pmcj_2015_06_006 crossref_primary_10_7717_peerj_cs_2487 crossref_primary_10_1109_TCYB_2018_2809562 crossref_primary_10_1016_j_egypro_2019_01_821 crossref_primary_10_1002_ett_4771 crossref_primary_10_1016_j_cose_2017_04_006 crossref_primary_10_1002_dac_3225 crossref_primary_10_3390_en15197419 crossref_primary_10_4018_IJAMC_2018070101 crossref_primary_10_1016_j_cosrev_2018_01_003 crossref_primary_10_1016_j_comnet_2013_04_005 crossref_primary_10_1155_2019_6271017 crossref_primary_10_1007_s11416_024_00513_5 crossref_primary_10_1142_S0218213023600059 crossref_primary_10_1007_s10044_020_00872_x crossref_primary_10_1109_TKDE_2015_2426713 crossref_primary_10_1093_jigpal_jzu035 crossref_primary_10_1007_s11704_019_8208_z crossref_primary_10_1080_01969722_2013_803903 crossref_primary_10_1016_j_compeleceng_2019_07_023 crossref_primary_10_1109_TSMC_2017_2700495 crossref_primary_10_1109_TCYB_2016_2569529 crossref_primary_10_1109_TKDE_2018_2818729 |
Cites_doi | 10.1007/s11416-008-0082-4 10.1145/1250734.1250746 10.1002/9780470316801.ch2 10.1109/ICPR.2010.1010 10.1109/MALWARE.2008.4690860 10.1137/1.9781611972788.72 10.1007/978-3-540-74320-0_10 10.1145/1281192.1281308 10.1007/978-3-540-70542-0_6 10.1023/A:1023949509487 10.1109/SECPRI.2001.924286 10.1145/1242572.1242659 10.1109/ITNG.2010.117 10.1109/UIC-ATC.2009.62 10.1109/TPAMI.2005.237 10.1201/9781584889977 10.1109/TNN.2005.845141 10.1145/1299015.1299021 10.1145/1557019.1557167 10.1109/ECRIME.2009.5342614 10.1016/j.csda.2008.10.015 10.1145/505282.505283 10.1145/900051.900096 10.1007/978-3-540-74565-5_5 10.1145/1015330.1015414 10.1145/1595676.1595686 10.1109/ACSAC.2007.21 10.1109/SP.2010.11 10.1007/978-3-642-15037-1_20 10.1109/ICDE.2005.34 10.1145/1835804.1835820 |
ContentType | Journal Article |
Copyright | 2014 INIST-CNRS |
Copyright_xml | – notice: 2014 INIST-CNRS |
DBID | 97E RIA RIE AAYXX CITATION IQODW 7SC 7SP 7TB 8FD F28 FR3 JQ2 L7M L~C L~D |
DOI | 10.1109/TSMCC.2012.2222025 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef Pascal-Francis Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Sciences (General) Applied Sciences |
EISSN | 1558-2442 |
EndPage | 1796 |
ExternalDocumentID | 26818941 10_1109_TSMCC_2012_2222025 6392461 |
Genre | orig-research |
GroupedDBID | -~X 0R~ 29I 4.4 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFS AETIX AGQYO AGSQL AHBIQ AI. AIBXA ALLEH ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD F5P HZ~ H~9 IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL PZZ RIA RIE RNS VH1 AAYOK AAYXX CITATION RIG IQODW 7SC 7SP 7TB 8FD F28 FR3 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c358t-b68c1f31d2421f305638efe65e3f9a60ff58d9e2078dbfe204aef80741a1e2dc3 |
IEDL.DBID | RIE |
ISSN | 1094-6977 |
IngestDate | Fri Jul 11 11:31:31 EDT 2025 Wed Apr 02 07:26:17 EDT 2025 Thu Apr 24 22:52:18 EDT 2025 Tue Jul 01 03:52:42 EDT 2025 Tue Aug 26 17:18:15 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 6 |
Keywords | Cluster analysis malware categorization Fraud Economic sciences Aggregate model phishing website detection Computer virus Efficiency News Selection criterion Computer security Pattern extraction Computer attack Damaging Data analysis Cluster Pattern recognition Distributed system Consensus Automatic measurement Distributed algorithm Cluster ensemble Internet Web site Intrusion detection systems |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html CC BY 4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c358t-b68c1f31d2421f305638efe65e3f9a60ff58d9e2078dbfe204aef80741a1e2dc3 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
PQID | 1315698408 |
PQPubID | 23500 |
PageCount | 13 |
ParticipantIDs | crossref_primary_10_1109_TSMCC_2012_2222025 proquest_miscellaneous_1315698408 crossref_citationtrail_10_1109_TSMCC_2012_2222025 ieee_primary_6392461 pascalfrancis_primary_26818941 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2012-11-01 |
PublicationDateYYYYMMDD | 2012-11-01 |
PublicationDate_xml | – month: 11 year: 2012 text: 2012-11-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | New-York, NY |
PublicationPlace_xml | – name: New-York, NY |
PublicationTitle | IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews |
PublicationTitleAbbrev | TSMCC |
PublicationYear | 2012 |
Publisher | IEEE Institute of Electrical and Electronics Engineers |
Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers |
References | strehl (ref36) 2003; 3 ref13 ref12 lee (ref22) 2006 ref16 fukuyama (ref14) 1989 ref19 gheorghescu (ref15) 2005 (ref31) 0 chou (ref8) 2004 wang (ref41) 2009 royal (ref33) 2006 basu (ref5) 2008 ref46 ref45 ref48 ref47 ref44 ref49 ref7 gurrutxaga (ref17) 2008 ref4 ref40 ref35 ref34 ref37 liu (ref25) 2006 ref30 azimi (ref3) 2009 rieck (ref32) 2008 ester (ref11) 1996 ref2 ref1 ref39 moser (ref29) 2007 elovici (ref10) 2007 ref24 ref23 ref26 ref20 hartigan (ref18) 1979; 28 theodoridis (ref38) 1999 ref21 bayer (ref6) 2009 williams (ref42) 2000 ref28 ref27 wu (ref43) 2004 dazeley (ref9) 2010 |
References_xml | – ident: ref48 doi: 10.1007/s11416-008-0082-4 – year: 2006 ident: ref33 article-title: PolyUnpack: Automating the hidden-code extraction of unpack-executing malware publication-title: Proc 22nd Annu Comput Secur Appl Conf – start-page: 1279 year: 2009 ident: ref41 article-title: Generalized cluster aggregation publication-title: Proc 21st Int Joint Conf Artif Intell – ident: ref30 doi: 10.1145/1250734.1250746 – year: 2009 ident: ref6 article-title: Scalable, behavior-based malware clustering publication-title: Proc 16th Netw Distributed Security Symp – ident: ref19 doi: 10.1002/9780470316801.ch2 – ident: ref24 doi: 10.1109/ICPR.2010.1010 – ident: ref39 doi: 10.1109/MALWARE.2008.4690860 – start-page: 247 year: 1989 ident: ref14 article-title: A new method of choosing the number of clusters for the fuzzy C-means method publication-title: Proc 5th Fuzzy Syst Symp – ident: ref23 doi: 10.1137/1.9781611972788.72 – ident: ref4 doi: 10.1007/978-3-540-74320-0_10 – ident: ref47 doi: 10.1145/1281192.1281308 – start-page: 108 year: 2008 ident: ref32 article-title: Learning and classification of malware behavior publication-title: Proc Detection of Intrusions and Malware and Vulnerability Assessment doi: 10.1007/978-3-540-70542-0_6 – ident: ref27 doi: 10.1023/A:1023949509487 – year: 2004 ident: ref8 article-title: Client-side defense against web-based identity theft publication-title: Proc 11th Annu Netw Distrib Syst Security Symp – start-page: 680 year: 2000 ident: ref42 article-title: A MCMC approach to hierarchical mixture modeling publication-title: Proc Advance in Neural Inform Process System 12 – ident: ref35 doi: 10.1109/SECPRI.2001.924286 – ident: ref49 doi: 10.1145/1242572.1242659 – ident: ref2 doi: 10.1109/ITNG.2010.117 – ident: ref20 doi: 10.1109/UIC-ATC.2009.62 – ident: ref40 doi: 10.1109/TPAMI.2005.237 – year: 2008 ident: ref5 publication-title: Constrained Clustering Advances in Algorithms Theory and Applications doi: 10.1201/9781584889977 – start-page: 58 year: 2006 ident: ref25 article-title: An antiphishing strategy based on visual similarity assessment publication-title: Proc IEEE Internet Comput – volume: 28 start-page: 100 year: 1979 ident: ref18 article-title: Algorithm AS136: A k-means clustering algorithm publication-title: Journal of Royal Statistical Society C Applied Statistics – ident: ref44 doi: 10.1109/TNN.2005.845141 – year: 0 ident: ref31 publication-title: QEMU (2012) – year: 1999 ident: ref38 publication-title: Pattern Recognition – ident: ref1 doi: 10.1145/1299015.1299021 – ident: ref46 doi: 10.1145/1557019.1557167 – start-page: 992 year: 2009 ident: ref3 article-title: Adaptive cluster ensemble selection publication-title: Proc 21st Int Joint Conf Artif Intell – ident: ref21 doi: 10.1109/ECRIME.2009.5342614 – ident: ref26 doi: 10.1016/j.csda.2008.10.015 – ident: ref34 doi: 10.1145/505282.505283 – ident: ref37 doi: 10.1145/900051.900096 – volume: 3 start-page: 583 year: 2003 ident: ref36 article-title: Cluster ensembles-A knowledge reuse framework for combining multiple partitions publication-title: J Mach Learn Res – year: 2005 ident: ref15 article-title: An automated virus classification system publication-title: Proc Virus Bulletin Conf – start-page: 44 year: 2007 ident: ref10 article-title: Applying machine learning techniques for detection of malicious code in network traffic publication-title: KI 2007 Advances in Artificial Intelligence (Lecture Notes in Computer Science doi: 10.1007/978-3-540-74565-5_5 – ident: ref12 doi: 10.1145/1015330.1015414 – year: 2006 ident: ref22 article-title: Behavioral classification publication-title: Proc EIC – ident: ref7 doi: 10.1145/1595676.1595686 – ident: ref28 doi: 10.1109/ACSAC.2007.21 – ident: ref13 doi: 10.1109/SP.2010.11 – start-page: 231 year: 2007 ident: ref29 article-title: Exploring multiple execution paths for malware analysis publication-title: Proc IEEE Symp Secur Privacy – start-page: 235 year: 2010 ident: ref9 article-title: Consensus clustering and supervised classification for profiling phishing emails in internet commerce security publication-title: Knowledge Management and Acquisition for Smart Systems and Service (Lecture Notes in Computer Science vol 6232) doi: 10.1007/978-3-642-15037-1_20 – ident: ref16 doi: 10.1109/ICDE.2005.34 – start-page: 226 year: 1996 ident: ref11 article-title: A density-based algorithm for discovering clusters in large spatial database with noise publication-title: Proc ACM Int Conf Knowl Discovery Data Mining – ident: ref45 doi: 10.1145/1835804.1835820 – start-page: 163 year: 2008 ident: ref17 article-title: Evaluation of Malware clustering based on its dynamic behaviour publication-title: Proc 7th Australas Data Mining Conf – year: 2004 ident: ref43 publication-title: Fighting Phishing at the User Interface |
SSID | ssj0014493 |
Score | 2.25967 |
Snippet | Due to their damage to Internet security, malware and phishing website detection has been the Internet security topics that are of great interests. Compared... |
SourceID | proquest pascalfrancis crossref ieee |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 1784 |
SubjectTerms | Applied sciences Cluster ensemble Clustering Clustering algorithms Clusters Computer science; control theory; systems Computer systems and distributed systems. User interface Data mining Data processing. List processing. Character string processing Exact sciences and technology Feature extraction Internet Knowledge engineering Malware malware categorization Mathematical models Memory and file management (including protection and security) Memory organisation. Data processing Phishing phishing website detection Security Software Websites |
Title | Ensemble Clustering for Internet Security Applications |
URI | https://ieeexplore.ieee.org/document/6392461 https://www.proquest.com/docview/1315698408 |
Volume | 42 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLZgJzjwRozHVCQOIOho2iRrjmjahJDGBZC4VWnqXBgdYu2FX4_TdhUDhDi1Ut0qtZPYju3PAGdamiBMM04LKTA-FzHztUoHPtPCiIj2SmPdgf7kXt4-8btn8bwCV20tDCJWyWfYd7dVLD-bmdIdlV2TNnXwZ6uwSo5bXavVRgw4V3UyveK-JKNmUSATqOvHh8lw6LK4wj5pQ_L2xZISqrqquJxIPSe22LqfxY-tudI3402YLEZap5m89Msi7ZuPbyCO__2VLdhoDE_vpp4p27CC-Q6sf4Ej3IHtZqHPvfMGjfpiF-Qon-NrOkVvOC0dqgKRemTpevVZIhbeQ9MCz7v5Egzfg6fx6HF46zfNFnwTibjwUxkbZiOWuRixdY5FFKNFKTCySsvAWhFnCkMyKbLU0pVrtA5Jh2mGYWaifejksxwPwBOB4ZnkRg4Czcm_SSUyw5AbHDjoGtkFtuB-YhokctcQY5pUHkmgkkpiiZNY0kisC5ftO281Dsef1LuO5S1lw-0u9JaE3D4PJdktihPB6ULqCS0zFzvROc7KecIicnQVecPx4e_fPoI1N4K6TPEYOsV7iSdkrxRpr5qon1jW5Nk |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT9swED9B9wA8jPGllTGWSXsAQdo4sd34EVWgDtq-UCTeIsc5v9ClaE1e9tdzTtKIMjTtKZFyiZw723fnu_sdwA8tTRCmGaeFFBifi5j5WqUDn2lhRER7pbHuQH8ylaMHfvsoHjfgsq2FQcQq-Qx77raK5WcLU7qjsj5pUwd_tgkfSO8LVldrtTEDzlWdTq-4L8msWZXIBKo_u58Mhy6PK-yRPiR_X6ypoaqvisuK1EtijK07Wvy1OVca52YXJqux1okmT72ySHvmzxsYx__9mU_wsTE9vat6ruzBBub7sPMKkHAf9pqlvvTOGjzq8wOQ1_kSf6Vz9Ibz0uEqEKlHtq5XnyZi4d03TfC8q1fh8EN4uLmeDUd-027BN5GICz-VsWE2YpmLElvnWkQxWpQCI6u0DKwVcaYwJKMiSy1duUbrsHSYZhhmJjqCTr7I8TN4IjA8k9zIQaA5eTipRGYYcoMDB14ju8BW3E9Mg0XuWmLMk8onCVRSSSxxEksaiXXhon3nuUbi-Cf1gWN5S9lwuwuna0Jun4eSLBfFieD7SuoJLTQXPdE5LsplwiJydRX5w_Hx-9_-Bluj2WScjH9O777AthtNXbR4Ap3id4lfyXop0tNq0r4AcX_oIg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Ensemble+Clustering+for+Internet+Security+Applications&rft.jtitle=IEEE+transactions+on+systems%2C+man+and+cybernetics.+Part+C%2C+Applications+and+reviews&rft.au=Zhuang%2C+Weiwei&rft.au=Ye%2C+Yanfang&rft.au=Chen%2C+Yong&rft.au=Li%2C+Tao&rft.date=2012-11-01&rft.issn=1094-6977&rft.eissn=1558-2442&rft.volume=42&rft.issue=6&rft.spage=1784&rft.epage=1796&rft_id=info:doi/10.1109%2FTSMCC.2012.2222025&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TSMCC_2012_2222025 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1094-6977&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1094-6977&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1094-6977&client=summon |