SPWalk: Similar Property Oriented Feature Learning for Phishing Detection
Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usual...
Saved in:
Published in | IEEE access Vol. 8; pp. 87031 - 87045 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usually make phishing webpages resemble the legitimate webpages in visual and functional aspects. This poses a greater difficulty for feature extraction. We herein propose SPWalk , an unsupervised feature learning algorithm for phishing detection. In SPWalk , similar property nodes refer to a collection of phishing webpages or legitimate webpages. We first construct a weblink network with nodes representing webpages. The edges between nodes represent the reference relationships that connect webpages through hyperlinks or similar textual content. Then, SPWalk applies the network embedding technique to mapping nodes into a low-dimensional vector space. A biased random walk procedure efficiently integrates both structural information between nodes and URL information of each node. The effectiveness and robustness of SPWalk come from three points. (1). Phishing attackers do not have full control over reference relationships . (2). The structural regularities generated by diverse reference relationships can be exploited to discriminate between phishing and legitimate webpages. (3). Node URL information makes the learned node representations more suited for phishing detection. Using node as numeric features, we conduct experiments to classify webpages as legitimate or phishing. We demonstrate the superiority of SPWalk over state-of-the-art techniques on phishing detection, especially in terms of precision (over 95%). Even in the case that phishing webpages are well camouflaged by attackers for evading detection, SPwalk exhibits better classification efficacy consistently. |
---|---|
AbstractList | Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usually make phishing webpages resemble the legitimate webpages in visual and functional aspects. This poses a greater difficulty for feature extraction. We herein propose SPWalk, an unsupervised feature learning algorithm for phishing detection. In SPWalk, similar property nodes refer to a collection of phishing webpages or legitimate webpages. We first construct a weblink network with nodes representing webpages. The edges between nodes represent the reference relationships that connect webpages through hyperlinks or similar textual content. Then, SPWalk applies the network embedding technique to mapping nodes into a low-dimensional vector space. A biased random walk procedure efficiently integrates both structural information between nodes and URL information of each node. The effectiveness and robustness of SPWalk come from three points. (1). Phishing attackers do not have full control over reference relationships. (2). The structural regularities generated by diverse reference relationships can be exploited to discriminate between phishing and legitimate webpages. (3). Node URL information makes the learned node representations more suited for phishing detection. Using node as numeric features, we conduct experiments to classify webpages as legitimate or phishing. We demonstrate the superiority of SPWalk over state-of-the-art techniques on phishing detection, especially in terms of precision (over 95%). Even in the case that phishing webpages are well camouflaged by attackers for evading detection, SPwalk exhibits better classification efficacy consistently. |
Author | Fu, Jianming Liu, Xiuwen |
Author_xml | – sequence: 1 givenname: Xiuwen orcidid: 0000-0002-6202-1937 surname: Liu fullname: Liu, Xiuwen organization: Key Laboratory of Aerospace Information Security and Trusted Computing of Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China – sequence: 2 givenname: Jianming orcidid: 0000-0002-4639-5824 surname: Fu fullname: Fu, Jianming email: jmfu@whu.edu.cn organization: Key Laboratory of Aerospace Information Security and Trusted Computing of Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China |
BookMark | eNp9kUFvGyEUhFGVSnVT_4JcVurZLrAsC71FbtJYsmRLbtUjegtvY9z14rL4kH8fnE2jKIdwAZ7mGwbNZ3LRhx4JuWJ0zhjV364Xi5vtds4pp3OuNS8V-0AmnEk9K6tSXrw6fyLTYdjTvFQeVfWELLebP9D9_V5s_cF3EItNDEeM6aFYR499QlfcIqRTxGKFEHvf3xdtyLKdH3bnyw9MaJMP_RfysYVuwOnzfkl-3978WtzNVuufy8X1amYFVWnmnKCIDa2EsBxY41qQAiRSBQ5bJ6V1rq6bSrNaMtZypyAH107U2lIJsrwky9HXBdibY_QHiA8mgDdPgxDvDcTkbYdGWNcwaAQH0ebHmwYa3mqNgitEzln2-jp6HWP4d8IhmX04xT7HN1xUglGma5VVelTZGIYhYmusT3D-c4rgO8OoORdhxiLMuQjzXERmyzfs_8TvU1cj5RHxhdBUKS10-QhF35Yz |
CODEN | IAECCG |
CitedBy_id | crossref_primary_10_1177_18724981251321395 crossref_primary_10_3233_JIFS_223569 crossref_primary_10_1016_j_comnet_2024_110398 crossref_primary_10_1016_j_eswa_2023_119723 crossref_primary_10_1109_ACCESS_2022_3166474 crossref_primary_10_1016_j_jksuci_2023_01_004 |
Cites_doi | 10.3758/BF03193020 10.1016/j.eswa.2018.09.040 10.1007/978-3-319-66402-6_22 10.1145/1314389.1314391 10.1145/1557019.1557153 10.1145/2939672.2939754 10.1007/s11280-013-0250-4 10.1145/2736277.2741093 10.1145/2806416.2806512 10.1109/INFCOM.2011.5934995 10.1016/j.eswa.2016.01.028 10.1109/INFCOM.2010.5462216 10.1145/2623330.2623732 10.1109/ACCESS.2019.2893980 10.1016/j.future.2009.07.012 10.1145/2976749.2978387 10.1145/1553374.1553462 10.24963/ijcai.2017/544 10.1016/j.eswa.2010.04.044 10.1016/j.dss.2016.05.005 10.1016/j.cose.2015.07.006 10.18653/v1/P17-1158 10.1109/TDSC.2006.50 10.1145/1242572.1242659 10.3115/v1/D14-1162 10.1109/TNN.2011.2161999 10.1109/SP.2011.25 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
DOI | 10.1109/ACCESS.2020.2992381 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Directory of Open Access Journals - May need to register for free articles |
DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Materials Research Database |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2169-3536 |
EndPage | 87045 |
ExternalDocumentID | oai_doaj_org_article_4cdb1ab42a4f408bbab2f99e428ee221 10_1109_ACCESS_2020_2992381 9088949 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Science Foundation of China grantid: 61972297; U1636107 funderid: 10.13039/501100001809 |
GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c408t-dd40eeb0544c2a1bdfa64a6e08adefd66cdd77b5917611f2d8a5369d479c06a63 |
IEDL.DBID | DOA |
ISSN | 2169-3536 |
IngestDate | Wed Aug 27 01:29:17 EDT 2025 Mon Jun 30 06:19:28 EDT 2025 Tue Jul 01 01:22:33 EDT 2025 Thu Apr 24 22:51:56 EDT 2025 Wed Aug 27 02:41:42 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://creativecommons.org/licenses/by/4.0/legalcode |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c408t-dd40eeb0544c2a1bdfa64a6e08adefd66cdd77b5917611f2d8a5369d479c06a63 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-4639-5824 0000-0002-6202-1937 |
OpenAccessLink | https://doaj.org/article/4cdb1ab42a4f408bbab2f99e428ee221 |
PQID | 2454101978 |
PQPubID | 4845423 |
PageCount | 15 |
ParticipantIDs | crossref_primary_10_1109_ACCESS_2020_2992381 proquest_journals_2454101978 ieee_primary_9088949 doaj_primary_oai_doaj_org_article_4cdb1ab42a4f408bbab2f99e428ee221 crossref_citationtrail_10_1109_ACCESS_2020_2992381 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20200000 2020-00-00 20200101 2020-01-01 |
PublicationDateYYYYMMDD | 2020-01-01 |
PublicationDate_xml | – year: 2020 text: 20200000 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE access |
PublicationTitleAbbrev | Access |
PublicationYear | 2020 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref34 ref12 ref36 ref30 ref33 ref11 ref10 ref2 goldberg (ref38) 2014 ref1 ref17 ref16 mikolov (ref39) 2013 ref19 mikolov (ref29) 2013 (ref41) 2017 levy (ref35) 2014; 3 yang (ref15) 2015 moshchuk (ref3) 2006 mnih (ref45) 2008 ref24 berners-lee (ref37) 2017 ref23 ref48 ref26 ref47 ref25 mikolov (ref31) 2013 ref20 (ref43) 2017 rong (ref32) 2014 stallman (ref44) 2017 saidi (ref22) 2009 ref28 ref27 ref7 (ref40) 2017 ref9 ref4 ref6 tu (ref14) 2016 ref5 (ref42) 2017 sun (ref18) 2016 phillip (ref21) 2009 whittaker (ref8) 2010 pedregosa (ref46) 2016; 12 |
References_xml | – start-page: 1 year: 2010 ident: ref8 article-title: Large-scale automatic classification of phishing pages publication-title: Proc Network and Distributed System Security Symp (NDSS) – year: 2013 ident: ref29 article-title: Efficient estimation of word representations in vector space publication-title: Proc Workshop 1st Int Conf Learn Represent (ICLR) – year: 2017 ident: ref44 publication-title: Gnu wget – ident: ref34 doi: 10.3758/BF03193020 – ident: ref47 doi: 10.1016/j.eswa.2018.09.040 – year: 2014 ident: ref32 article-title: Word2vec parameter learning explained publication-title: arXiv 1411 2738 – ident: ref28 doi: 10.1007/978-3-319-66402-6_22 – start-page: 2111 year: 2015 ident: ref15 article-title: Network representation learning with rich text information publication-title: Proc Int Conf Artif Intell – year: 2017 ident: ref42 publication-title: Alexa the Web Information Company – ident: ref10 doi: 10.1145/1314389.1314391 – ident: ref4 doi: 10.1145/1557019.1557153 – year: 2017 ident: ref41 publication-title: Phishing Intelligence Feeds – ident: ref33 doi: 10.1145/2939672.2939754 – ident: ref23 doi: 10.1007/s11280-013-0250-4 – year: 2017 ident: ref43 publication-title: Open Directory Project – ident: ref19 doi: 10.1145/2736277.2741093 – ident: ref16 doi: 10.1145/2806416.2806512 – ident: ref9 doi: 10.1109/INFCOM.2011.5934995 – start-page: 18 year: 2006 ident: ref3 article-title: A crawler-based study of spyware in the Web publication-title: Proc Network and Distributed System Security Symp (NDSS) – ident: ref20 doi: 10.1016/j.eswa.2016.01.028 – ident: ref2 doi: 10.1109/INFCOM.2010.5462216 – year: 2014 ident: ref38 article-title: word2vec explained: Deriving Mikolov et al.'s negative-sampling word-embedding method publication-title: arXiv 1402 3722 – start-page: 3889 year: 2016 ident: ref14 article-title: Max-margin deepwalk: Discriminative learning of network representation publication-title: Proc Int Joint Conf Artif Intell – ident: ref13 doi: 10.1145/2623330.2623732 – volume: 12 start-page: 2825 year: 2016 ident: ref46 article-title: Scikit-learn: Machine learning in pythonGnu Wget publication-title: J Mach Learn Res – start-page: 1081 year: 2008 ident: ref45 article-title: A scalable hierarchical distributed language model publication-title: Proc Int Conf Neural Inf Process – ident: ref48 doi: 10.1109/ACCESS.2019.2893980 – volume: 3 start-page: 2177 year: 2014 ident: ref35 article-title: Neural word embedding as implicit matrix factorization publication-title: Advances in neural information processing systems – ident: ref11 doi: 10.1016/j.future.2009.07.012 – ident: ref1 doi: 10.1145/2976749.2978387 – year: 2016 ident: ref18 article-title: A general framework for content-enhanced network representation learning publication-title: arXiv 1610 02906 – ident: ref5 doi: 10.1145/1553374.1553462 – start-page: 3111 year: 2013 ident: ref39 article-title: Distributed representations of words and phrases and their compositionality publication-title: Proc Int Conf Neural Inf Process – ident: ref17 doi: 10.24963/ijcai.2017/544 – ident: ref7 doi: 10.1016/j.eswa.2010.04.044 – year: 2017 ident: ref40 publication-title: Phishtank Developer Information – ident: ref12 doi: 10.1016/j.dss.2016.05.005 – year: 2009 ident: ref21 article-title: Conficker C P2P protocol and implementation – start-page: 296 year: 2013 ident: ref31 article-title: Linguistic regularities in continuous space word representations publication-title: Proc HLT-NAACL – ident: ref27 doi: 10.1016/j.cose.2015.07.006 – ident: ref36 doi: 10.18653/v1/P17-1158 – ident: ref24 doi: 10.1109/TDSC.2006.50 – ident: ref26 doi: 10.1145/1242572.1242659 – year: 2009 ident: ref22 article-title: An analysis of confickers logic and rendezvous points – ident: ref30 doi: 10.3115/v1/D14-1162 – ident: ref25 doi: 10.1109/TNN.2011.2161999 – year: 2017 ident: ref37 publication-title: Uniform Resource Locators (URL) – ident: ref6 doi: 10.1109/SP.2011.25 |
SSID | ssj0000816957 |
Score | 2.2620046 |
Snippet | Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect... |
SourceID | doaj proquest crossref ieee |
SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 87031 |
SubjectTerms | Algorithms Classification Feature extraction Feature learning Hypertext systems Machine learning network embedding Nodes Phishing phishing detection Random walk Robustness Robustness (mathematics) Search engines similar property Uniform resource locators Visual aspects Visualization Websites |
SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT9wwELWAU3voB7TqtoB84EgW23GcmBtdQLQSdCWKys3yx6RF0KXaZg_013fseKNSUNVbFNmW42d75k3Gz4TsMOtqV5WuCKEqC-lrVlinoPB16XjLeVnzeFD49EydXMiPl9XlCtkdzsIAQEo-g3F8TP_yw61fxFDZXsrJkXqVrCJx689qDfGUeIGEruosLMSZ3juYTPAbkAIKNsZNN9qme8YnafTnS1Ue7MTJvBw_J6fLjvVZJdfjRefG_tdfmo3_2_MX5Fn2M-lBPzFekhWYrZOnf6gPbpAP59Mv9uZ6n55ffb9CgkunMTA_7-7opyh-jK4ojQ7iYg40q7B-peji0um3Pm5FD6FLiVyzV-Ti-Ojz5KTINysUXrKmQ2AkA3DorkkvLHehtUpaBayxAdqglA-hRgyRyynOWxEaW5VKB1lrz5RV5WuyNrudwRtCBYemCU3LrNay9ki20WOsgkbrGHwDekTEcsiNz7Lj8faLG5PoB9Omx8lEnEzGaUR2h0o_etWNfxd_H7EcikbJ7PQCMTB5BRrpg-PWSWFli4PgnHWi1RqQfwEIgY1sRNyGRjJkI7K5nBkmL--fRshK4l6GDPzt47XekSexg32sZpOsdfMFbKH30rntNG1_A65K6_4 priority: 102 providerName: IEEE |
Title | SPWalk: Similar Property Oriented Feature Learning for Phishing Detection |
URI | https://ieeexplore.ieee.org/document/9088949 https://www.proquest.com/docview/2454101978 https://doaj.org/article/4cdb1ab42a4f408bbab2f99e428ee221 |
Volume | 8 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8QgECbGkx6Mz7g-Nhw8WgVKafGmq0ZN1E3U6I3wqhp1NWv34L93oLhZY6IXrw2l8DGd-YbANwhtEW1KU-Qmc67IM25LkmkjfGbL3NCa0ryk4aLw-YU4ueFnd8XdRKmvcCaslQdugdvl1hmqDWea15xUxmjDaik90GbvWbxCziDmTSRT0QdXVMiiTDJDlMjd_V4PZgQJISM74IJDpPoWiqJifyqx8sMvx2BzPI_mEkvE--3oFtCUHyyi2QntwCV0etW_1c9Pe_jq8eUR0lPcD9vqw-YDXwbpYiCSONC70dDjpKF6j4Gg4v5Du-uED30Tj2ENltHN8dF17yRLdREyCwg0ACsn3hsgW9wyTY2rteBaeFJp52snhHWuhBWATExQWjNX6SIX0vFSWiK0yFfQ9OB14FcRZtRXlatqoqXkpYVUGfhe4STENmcrLzuIfUGkbBIND7UrnlVMHohULa4q4KoSrh20PX7prdXM-L35QcB-3DQIXscHYAYqmYH6yww6aCms3LiTeHyLw_g3vlZSpZ_zXTFecPBEkD-v_cen19FMmE67L7OBppvhyG8CU2lMNxplN14q_ARen-Rp |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwELZKOQAHXgWxUMAHuJGt7ThOjMShbKl26YOV2oreXD8mULVs0TYrVH4Lf4X_xjjxrniJWyVuUWRbzvjLeGY8_oaQZ8y60hW5y0Io8kz6kmXWKch8mTtec56XPF4U3tlVwwP59rA4XCLfFndhAKBNPoN-fGzP8sOZn8VQ2VqbkyN1SqHcgosv6KCdvxpt4Go-F2Lzzf5gmKUaApmXrGpwCpIBODRMpBeWu1BbJa0CVtkAdVDKh1DibNFrUZzXIlS2yJUOstSeKatyHPcKuYp2RiG622GLCE4sWaGLMlEZcabX1gcDlBo6nYL1Uc3H3fCX7a6tCpDKuPyh-9sNbfMW-T4XRZfHctKfNa7vv_7GEvm_yuo2uZksabreQf8OWYLJXXLjJ37FFTLaG7-3pycv6d7xp2N04ek4Hj1Mmwv6LtI7o7FNowk8mwJNPLMfKBrxdPyxi8zRDWjaVLXJPXJwKd9ynyxPzibwgFDBoapCVTOrtSy90pEyrwga9__gK9A9IuZLbHwiVo_1PU5N62AxbTpcmIgLk3DRIy8WnT53vCL_bv46YmfRNJKCty9wzU3SMUb64Lh1UlhZoxCcs07UWgN6mABC4CArESeLQRJEemR1jkSTFNi5EbKQqK11WT38e6-n5Npwf2fbbI92tx6R63GyXWRqlSw30xk8RlutcU_aX4aSo8vG3Q_Jtkt1 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPWalk%3A+Similar+Property+Oriented+Feature+Learning+for+Phishing+Detection&rft.jtitle=IEEE+access&rft.au=Liu%2C+Xiuwen&rft.au=Fu%2C+Jianming&rft.date=2020&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=8&rft.spage=87031&rft.epage=87045&rft_id=info:doi/10.1109%2FACCESS.2020.2992381&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2020_2992381 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |