SPWalk: Similar Property Oriented Feature Learning for Phishing Detection

Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usual...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 8; pp. 87031 - 87045
Main Authors Liu, Xiuwen, Fu, Jianming
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usually make phishing webpages resemble the legitimate webpages in visual and functional aspects. This poses a greater difficulty for feature extraction. We herein propose SPWalk , an unsupervised feature learning algorithm for phishing detection. In SPWalk , similar property nodes refer to a collection of phishing webpages or legitimate webpages. We first construct a weblink network with nodes representing webpages. The edges between nodes represent the reference relationships that connect webpages through hyperlinks or similar textual content. Then, SPWalk applies the network embedding technique to mapping nodes into a low-dimensional vector space. A biased random walk procedure efficiently integrates both structural information between nodes and URL information of each node. The effectiveness and robustness of SPWalk come from three points. (1). Phishing attackers do not have full control over reference relationships . (2). The structural regularities generated by diverse reference relationships can be exploited to discriminate between phishing and legitimate webpages. (3). Node URL information makes the learned node representations more suited for phishing detection. Using node as numeric features, we conduct experiments to classify webpages as legitimate or phishing. We demonstrate the superiority of SPWalk over state-of-the-art techniques on phishing detection, especially in terms of precision (over 95%). Even in the case that phishing webpages are well camouflaged by attackers for evading detection, SPwalk exhibits better classification efficacy consistently.
AbstractList Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect webpage as phishing or legitimate, robust and effective features used for classification are in demand. However, recent phishing attacks usually make phishing webpages resemble the legitimate webpages in visual and functional aspects. This poses a greater difficulty for feature extraction. We herein propose SPWalk, an unsupervised feature learning algorithm for phishing detection. In SPWalk, similar property nodes refer to a collection of phishing webpages or legitimate webpages. We first construct a weblink network with nodes representing webpages. The edges between nodes represent the reference relationships that connect webpages through hyperlinks or similar textual content. Then, SPWalk applies the network embedding technique to mapping nodes into a low-dimensional vector space. A biased random walk procedure efficiently integrates both structural information between nodes and URL information of each node. The effectiveness and robustness of SPWalk come from three points. (1). Phishing attackers do not have full control over reference relationships. (2). The structural regularities generated by diverse reference relationships can be exploited to discriminate between phishing and legitimate webpages. (3). Node URL information makes the learned node representations more suited for phishing detection. Using node as numeric features, we conduct experiments to classify webpages as legitimate or phishing. We demonstrate the superiority of SPWalk over state-of-the-art techniques on phishing detection, especially in terms of precision (over 95%). Even in the case that phishing webpages are well camouflaged by attackers for evading detection, SPwalk exhibits better classification efficacy consistently.
Author Fu, Jianming
Liu, Xiuwen
Author_xml – sequence: 1
  givenname: Xiuwen
  orcidid: 0000-0002-6202-1937
  surname: Liu
  fullname: Liu, Xiuwen
  organization: Key Laboratory of Aerospace Information Security and Trusted Computing of Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China
– sequence: 2
  givenname: Jianming
  orcidid: 0000-0002-4639-5824
  surname: Fu
  fullname: Fu, Jianming
  email: jmfu@whu.edu.cn
  organization: Key Laboratory of Aerospace Information Security and Trusted Computing of Ministry of Education, School of Cyber Science and Engineering, Wuhan University, Wuhan, China
BookMark eNp9kUFvGyEUhFGVSnVT_4JcVurZLrAsC71FbtJYsmRLbtUjegtvY9z14rL4kH8fnE2jKIdwAZ7mGwbNZ3LRhx4JuWJ0zhjV364Xi5vtds4pp3OuNS8V-0AmnEk9K6tSXrw6fyLTYdjTvFQeVfWELLebP9D9_V5s_cF3EItNDEeM6aFYR499QlfcIqRTxGKFEHvf3xdtyLKdH3bnyw9MaJMP_RfysYVuwOnzfkl-3978WtzNVuufy8X1amYFVWnmnKCIDa2EsBxY41qQAiRSBQ5bJ6V1rq6bSrNaMtZypyAH107U2lIJsrwky9HXBdibY_QHiA8mgDdPgxDvDcTkbYdGWNcwaAQH0ebHmwYa3mqNgitEzln2-jp6HWP4d8IhmX04xT7HN1xUglGma5VVelTZGIYhYmusT3D-c4rgO8OoORdhxiLMuQjzXERmyzfs_8TvU1cj5RHxhdBUKS10-QhF35Yz
CODEN IAECCG
CitedBy_id crossref_primary_10_1177_18724981251321395
crossref_primary_10_3233_JIFS_223569
crossref_primary_10_1016_j_comnet_2024_110398
crossref_primary_10_1016_j_eswa_2023_119723
crossref_primary_10_1109_ACCESS_2022_3166474
crossref_primary_10_1016_j_jksuci_2023_01_004
Cites_doi 10.3758/BF03193020
10.1016/j.eswa.2018.09.040
10.1007/978-3-319-66402-6_22
10.1145/1314389.1314391
10.1145/1557019.1557153
10.1145/2939672.2939754
10.1007/s11280-013-0250-4
10.1145/2736277.2741093
10.1145/2806416.2806512
10.1109/INFCOM.2011.5934995
10.1016/j.eswa.2016.01.028
10.1109/INFCOM.2010.5462216
10.1145/2623330.2623732
10.1109/ACCESS.2019.2893980
10.1016/j.future.2009.07.012
10.1145/2976749.2978387
10.1145/1553374.1553462
10.24963/ijcai.2017/544
10.1016/j.eswa.2010.04.044
10.1016/j.dss.2016.05.005
10.1016/j.cose.2015.07.006
10.18653/v1/P17-1158
10.1109/TDSC.2006.50
10.1145/1242572.1242659
10.3115/v1/D14-1162
10.1109/TNN.2011.2161999
10.1109/SP.2011.25
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2020.2992381
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Directory of Open Access Journals - May need to register for free articles
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Materials Research Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 87045
ExternalDocumentID oai_doaj_org_article_4cdb1ab42a4f408bbab2f99e428ee221
10_1109_ACCESS_2020_2992381
9088949
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation of China
  grantid: 61972297; U1636107
  funderid: 10.13039/501100001809
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
RIG
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c408t-dd40eeb0544c2a1bdfa64a6e08adefd66cdd77b5917611f2d8a5369d479c06a63
IEDL.DBID DOA
ISSN 2169-3536
IngestDate Wed Aug 27 01:29:17 EDT 2025
Mon Jun 30 06:19:28 EDT 2025
Tue Jul 01 01:22:33 EDT 2025
Thu Apr 24 22:51:56 EDT 2025
Wed Aug 27 02:41:42 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c408t-dd40eeb0544c2a1bdfa64a6e08adefd66cdd77b5917611f2d8a5369d479c06a63
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4639-5824
0000-0002-6202-1937
OpenAccessLink https://doaj.org/article/4cdb1ab42a4f408bbab2f99e428ee221
PQID 2454101978
PQPubID 4845423
PageCount 15
ParticipantIDs crossref_primary_10_1109_ACCESS_2020_2992381
proquest_journals_2454101978
ieee_primary_9088949
doaj_primary_oai_doaj_org_article_4cdb1ab42a4f408bbab2f99e428ee221
crossref_citationtrail_10_1109_ACCESS_2020_2992381
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20200000
2020-00-00
20200101
2020-01-01
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 20200000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref34
ref12
ref36
ref30
ref33
ref11
ref10
ref2
goldberg (ref38) 2014
ref1
ref17
ref16
mikolov (ref39) 2013
ref19
mikolov (ref29) 2013
(ref41) 2017
levy (ref35) 2014; 3
yang (ref15) 2015
moshchuk (ref3) 2006
mnih (ref45) 2008
ref24
berners-lee (ref37) 2017
ref23
ref48
ref26
ref47
ref25
mikolov (ref31) 2013
ref20
(ref43) 2017
rong (ref32) 2014
stallman (ref44) 2017
saidi (ref22) 2009
ref28
ref27
ref7
(ref40) 2017
ref9
ref4
ref6
tu (ref14) 2016
ref5
(ref42) 2017
sun (ref18) 2016
phillip (ref21) 2009
whittaker (ref8) 2010
pedregosa (ref46) 2016; 12
References_xml – start-page: 1
  year: 2010
  ident: ref8
  article-title: Large-scale automatic classification of phishing pages
  publication-title: Proc Network and Distributed System Security Symp (NDSS)
– year: 2013
  ident: ref29
  article-title: Efficient estimation of word representations in vector space
  publication-title: Proc Workshop 1st Int Conf Learn Represent (ICLR)
– year: 2017
  ident: ref44
  publication-title: Gnu wget
– ident: ref34
  doi: 10.3758/BF03193020
– ident: ref47
  doi: 10.1016/j.eswa.2018.09.040
– year: 2014
  ident: ref32
  article-title: Word2vec parameter learning explained
  publication-title: arXiv 1411 2738
– ident: ref28
  doi: 10.1007/978-3-319-66402-6_22
– start-page: 2111
  year: 2015
  ident: ref15
  article-title: Network representation learning with rich text information
  publication-title: Proc Int Conf Artif Intell
– year: 2017
  ident: ref42
  publication-title: Alexa the Web Information Company
– ident: ref10
  doi: 10.1145/1314389.1314391
– ident: ref4
  doi: 10.1145/1557019.1557153
– year: 2017
  ident: ref41
  publication-title: Phishing Intelligence Feeds
– ident: ref33
  doi: 10.1145/2939672.2939754
– ident: ref23
  doi: 10.1007/s11280-013-0250-4
– year: 2017
  ident: ref43
  publication-title: Open Directory Project
– ident: ref19
  doi: 10.1145/2736277.2741093
– ident: ref16
  doi: 10.1145/2806416.2806512
– ident: ref9
  doi: 10.1109/INFCOM.2011.5934995
– start-page: 18
  year: 2006
  ident: ref3
  article-title: A crawler-based study of spyware in the Web
  publication-title: Proc Network and Distributed System Security Symp (NDSS)
– ident: ref20
  doi: 10.1016/j.eswa.2016.01.028
– ident: ref2
  doi: 10.1109/INFCOM.2010.5462216
– year: 2014
  ident: ref38
  article-title: word2vec explained: Deriving Mikolov et al.'s negative-sampling word-embedding method
  publication-title: arXiv 1402 3722
– start-page: 3889
  year: 2016
  ident: ref14
  article-title: Max-margin deepwalk: Discriminative learning of network representation
  publication-title: Proc Int Joint Conf Artif Intell
– ident: ref13
  doi: 10.1145/2623330.2623732
– volume: 12
  start-page: 2825
  year: 2016
  ident: ref46
  article-title: Scikit-learn: Machine learning in pythonGnu Wget
  publication-title: J Mach Learn Res
– start-page: 1081
  year: 2008
  ident: ref45
  article-title: A scalable hierarchical distributed language model
  publication-title: Proc Int Conf Neural Inf Process
– ident: ref48
  doi: 10.1109/ACCESS.2019.2893980
– volume: 3
  start-page: 2177
  year: 2014
  ident: ref35
  article-title: Neural word embedding as implicit matrix factorization
  publication-title: Advances in neural information processing systems
– ident: ref11
  doi: 10.1016/j.future.2009.07.012
– ident: ref1
  doi: 10.1145/2976749.2978387
– year: 2016
  ident: ref18
  article-title: A general framework for content-enhanced network representation learning
  publication-title: arXiv 1610 02906
– ident: ref5
  doi: 10.1145/1553374.1553462
– start-page: 3111
  year: 2013
  ident: ref39
  article-title: Distributed representations of words and phrases and their compositionality
  publication-title: Proc Int Conf Neural Inf Process
– ident: ref17
  doi: 10.24963/ijcai.2017/544
– ident: ref7
  doi: 10.1016/j.eswa.2010.04.044
– year: 2017
  ident: ref40
  publication-title: Phishtank Developer Information
– ident: ref12
  doi: 10.1016/j.dss.2016.05.005
– year: 2009
  ident: ref21
  article-title: Conficker C P2P protocol and implementation
– start-page: 296
  year: 2013
  ident: ref31
  article-title: Linguistic regularities in continuous space word representations
  publication-title: Proc HLT-NAACL
– ident: ref27
  doi: 10.1016/j.cose.2015.07.006
– ident: ref36
  doi: 10.18653/v1/P17-1158
– ident: ref24
  doi: 10.1109/TDSC.2006.50
– ident: ref26
  doi: 10.1145/1242572.1242659
– year: 2009
  ident: ref22
  article-title: An analysis of confickers logic and rendezvous points
– ident: ref30
  doi: 10.3115/v1/D14-1162
– ident: ref25
  doi: 10.1109/TNN.2011.2161999
– year: 2017
  ident: ref37
  publication-title: Uniform Resource Locators (URL)
– ident: ref6
  doi: 10.1109/SP.2011.25
SSID ssj0000816957
Score 2.2620046
Snippet Detecting phishing webpages is an essential task that protects legitimate websites and their users from various malicious activities. To classify the suspect...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 87031
SubjectTerms Algorithms
Classification
Feature extraction
Feature learning
Hypertext systems
Machine learning
network embedding
Nodes
Phishing
phishing detection
Random walk
Robustness
Robustness (mathematics)
Search engines
similar property
Uniform resource locators
Visual aspects
Visualization
Websites
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT9wwELWAU3voB7TqtoB84EgW23GcmBtdQLQSdCWKys3yx6RF0KXaZg_013fseKNSUNVbFNmW42d75k3Gz4TsMOtqV5WuCKEqC-lrVlinoPB16XjLeVnzeFD49EydXMiPl9XlCtkdzsIAQEo-g3F8TP_yw61fxFDZXsrJkXqVrCJx689qDfGUeIGEruosLMSZ3juYTPAbkAIKNsZNN9qme8YnafTnS1Ue7MTJvBw_J6fLjvVZJdfjRefG_tdfmo3_2_MX5Fn2M-lBPzFekhWYrZOnf6gPbpAP59Mv9uZ6n55ffb9CgkunMTA_7-7opyh-jK4ojQ7iYg40q7B-peji0um3Pm5FD6FLiVyzV-Ti-Ojz5KTINysUXrKmQ2AkA3DorkkvLHehtUpaBayxAdqglA-hRgyRyynOWxEaW5VKB1lrz5RV5WuyNrudwRtCBYemCU3LrNay9ki20WOsgkbrGHwDekTEcsiNz7Lj8faLG5PoB9Omx8lEnEzGaUR2h0o_etWNfxd_H7EcikbJ7PQCMTB5BRrpg-PWSWFli4PgnHWi1RqQfwEIgY1sRNyGRjJkI7K5nBkmL--fRshK4l6GDPzt47XekSexg32sZpOsdfMFbKH30rntNG1_A65K6_4
  priority: 102
  providerName: IEEE
Title SPWalk: Similar Property Oriented Feature Learning for Phishing Detection
URI https://ieeexplore.ieee.org/document/9088949
https://www.proquest.com/docview/2454101978
https://doaj.org/article/4cdb1ab42a4f408bbab2f99e428ee221
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT8QgECbGkx6Mz7g-Nhw8WgVKafGmq0ZN1E3U6I3wqhp1NWv34L93oLhZY6IXrw2l8DGd-YbANwhtEW1KU-Qmc67IM25LkmkjfGbL3NCa0ryk4aLw-YU4ueFnd8XdRKmvcCaslQdugdvl1hmqDWea15xUxmjDaik90GbvWbxCziDmTSRT0QdXVMiiTDJDlMjd_V4PZgQJISM74IJDpPoWiqJifyqx8sMvx2BzPI_mEkvE--3oFtCUHyyi2QntwCV0etW_1c9Pe_jq8eUR0lPcD9vqw-YDXwbpYiCSONC70dDjpKF6j4Gg4v5Du-uED30Tj2ENltHN8dF17yRLdREyCwg0ACsn3hsgW9wyTY2rteBaeFJp52snhHWuhBWATExQWjNX6SIX0vFSWiK0yFfQ9OB14FcRZtRXlatqoqXkpYVUGfhe4STENmcrLzuIfUGkbBIND7UrnlVMHohULa4q4KoSrh20PX7prdXM-L35QcB-3DQIXscHYAYqmYH6yww6aCms3LiTeHyLw_g3vlZSpZ_zXTFecPBEkD-v_cen19FMmE67L7OBppvhyG8CU2lMNxplN14q_ARen-Rp
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Lb9QwELZKOQAHXgWxUMAHuJGt7ThOjMShbKl26YOV2oreXD8mULVs0TYrVH4Lf4X_xjjxrniJWyVuUWRbzvjLeGY8_oaQZ8y60hW5y0Io8kz6kmXWKch8mTtec56XPF4U3tlVwwP59rA4XCLfFndhAKBNPoN-fGzP8sOZn8VQ2VqbkyN1SqHcgosv6KCdvxpt4Go-F2Lzzf5gmKUaApmXrGpwCpIBODRMpBeWu1BbJa0CVtkAdVDKh1DibNFrUZzXIlS2yJUOstSeKatyHPcKuYp2RiG622GLCE4sWaGLMlEZcabX1gcDlBo6nYL1Uc3H3fCX7a6tCpDKuPyh-9sNbfMW-T4XRZfHctKfNa7vv_7GEvm_yuo2uZksabreQf8OWYLJXXLjJ37FFTLaG7-3pycv6d7xp2N04ek4Hj1Mmwv6LtI7o7FNowk8mwJNPLMfKBrxdPyxi8zRDWjaVLXJPXJwKd9ynyxPzibwgFDBoapCVTOrtSy90pEyrwga9__gK9A9IuZLbHwiVo_1PU5N62AxbTpcmIgLk3DRIy8WnT53vCL_bv46YmfRNJKCty9wzU3SMUb64Lh1UlhZoxCcs07UWgN6mABC4CArESeLQRJEemR1jkSTFNi5EbKQqK11WT38e6-n5Npwf2fbbI92tx6R63GyXWRqlSw30xk8RlutcU_aX4aSo8vG3Q_Jtkt1
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPWalk%3A+Similar+Property+Oriented+Feature+Learning+for+Phishing+Detection&rft.jtitle=IEEE+access&rft.au=Liu%2C+Xiuwen&rft.au=Fu%2C+Jianming&rft.date=2020&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=8&rft.spage=87031&rft.epage=87045&rft_id=info:doi/10.1109%2FACCESS.2020.2992381&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2020_2992381
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon