Machine Learning-Based Advertisement Banner Identification Technique for Effective Piracy Website Detection Process

In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking th...

Full description

Saved in:
Bibliographic Details
Published inComputers, materials & continua Vol. 71; no. 2; pp. 2883 - 2899
Main Authors Adeba Jilcha, Lelisa, Kwak, Jin
Format Journal Article
LanguageEnglish
Published Henderson Tech Science Press 2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking those websites, particularly through affiliated government bodies. To do so, an effective detection mechanism is a necessary first step. Some researchers have used various approaches to analyze the possible common features of suspected piracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, these advertisements have some common attributes that make them unique as compared to advertisements posted on normal or legitimate websites. They usually encompass keywords such as click-words (words that redirect to install malicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key features in the process of successfully detecting websites involved in the act of copyright infringement. Research has been conducted to identify advertisements served on suspected piracy websites. However, these studies use a static approach that relies mainly on manual scanning for the aforementioned keywords. This brings with it some limitations, particularly in coping with the dynamic and ever-changing behavior of advertisements posted on these websites. Therefore, we propose a technique that can continuously fine-tune itself and is intelligent enough to effectively identify advertisement (Ad) banners extracted from suspected piracy websites. We have done this by leveraging the power of machine learning algorithms, particularly the support vector machine with the word2vec word-embedding model. After applying the proposed technique to 1015 Ad banners collected from 98 suspected piracy websites and 90 normal or legitimate websites, we were able to successfully identify Ad banners extracted from suspected piracy websites with an accuracy of 97%. We present this technique with the hope that it will be a useful tool for various effective piracy website detection approaches. To our knowledge, this is the first approach that uses machine learning to identify Ad banners served on suspected piracy websites.
AbstractList In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking those websites, particularly through affiliated government bodies. To do so, an effective detection mechanism is a necessary first step. Some researchers have used various approaches to analyze the possible common features of suspected piracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, these advertisements have some common attributes that make them unique as compared to advertisements posted on normal or legitimate websites. They usually encompass keywords such as click-words (words that redirect to install malicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key features in the process of successfully detecting websites involved in the act of copyright infringement. Research has been conducted to identify advertisements served on suspected piracy websites. However, these studies use a static approach that relies mainly on manual scanning for the aforementioned keywords. This brings with it some limitations, particularly in coping with the dynamic and ever-changing behavior of advertisements posted on these websites. Therefore, we propose a technique that can continuously fine-tune itself and is intelligent enough to effectively identify advertisement (Ad) banners extracted from suspected piracy websites. We have done this by leveraging the power of machine learning algorithms, particularly the support vector machine with the word2vec word-embedding model. After applying the proposed technique to 1015 Ad banners collected from 98 suspected piracy websites and 90 normal or legitimate websites, we were able to successfully identify Ad banners extracted from suspected piracy websites with an accuracy of 97%. We present this technique with the hope that it will be a useful tool for various effective piracy website detection approaches. To our knowledge, this is the first approach that uses machine learning to identify Ad banners served on suspected piracy websites.
Author Adeba Jilcha, Lelisa
Kwak, Jin
Author_xml – sequence: 1
  givenname: Lelisa
  surname: Adeba Jilcha
  fullname: Adeba Jilcha, Lelisa
– sequence: 2
  givenname: Jin
  surname: Kwak
  fullname: Kwak, Jin
BookMark eNp1kE1PAjEQhhuDiYjePTbxvNhtabt7BEQlwcgB43HT7U6lBLrYFhL-vQU8GBNP8_W-M5PnGnVc6wChu5z0GRVk8KA3uk8JpX1CWS7kBermfCAySqno_Mqv0HUIK0KYYCXpovCq9NI6wDNQ3ln3mY1UgAYPmz34aANswEU8Us6Bx9MmFdZYraJtHV6AXjr7tQNsWo8nxoCOdg94br3SB_wBdbAR8CPE4yAZ5r7VEMINujRqHeD2J_bQ-9NkMX7JZm_P0_FwlmmWs5gVwDjlgkNdN1qqZsAEKCa4JkVeSqlKyVhTl0pKw0wjTWGK1OCG1ZDTgjPWQ_fnvVvfpi9DrFbtzrt0sqIi56IY0LJMKnJWad-G4MFUW283yh-qnFQntFVCWx3RVme0ySL-WLSNJybRK7v-3_gNCH2BVg
CitedBy_id crossref_primary_10_1080_14459795_2023_2190372
crossref_primary_10_32604_csse_2023_037615
Cites_doi 10.1287/mnsc.2017.2984
10.1023/A:1012491419635
10.1016/j.neucom.2019.10.118
10.3390/s19235219
ContentType Journal Article
Copyright 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7SC
7SR
8BQ
8FD
ABUWG
AFKRA
AZQEC
BENPR
CCPQU
DWQXO
JG9
JQ2
L7M
L~C
L~D
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
DOI 10.32604/cmc.2022.023167
DatabaseName CrossRef
Computer and Information Systems Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
ProQuest One Community College
ProQuest Central Korea
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle CrossRef
Publicly Available Content Database
Materials Research Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest Central China
METADEX
Computer and Information Systems Abstracts Professional
ProQuest Central
Engineered Materials Abstracts
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
Advanced Technologies Database with Aerospace
ProQuest One Academic (New)
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1546-2226
EndPage 2899
ExternalDocumentID 10_32604_cmc_2022_023167
GroupedDBID AAFWJ
AAYXX
ACIWK
ADMLS
AFKRA
ALMA_UNASSIGNED_HOLDINGS
BENPR
CCPQU
CITATION
EBS
EJD
J9A
OK1
P2P
PHGZM
PHGZT
PIMPY
RTS
TUS
7SC
7SR
8BQ
8FD
ABUWG
AZQEC
DWQXO
JG9
JQ2
L7M
L~C
L~D
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c313t-8e352565ebbdc7ad436ea365c081977a9733db9a77f3fd7f8f833d5f3be128533
IEDL.DBID BENPR
ISSN 1546-2226
1546-2218
IngestDate Sun Jun 29 12:46:26 EDT 2025
Tue Jul 01 01:57:05 EDT 2025
Thu Apr 24 22:58:18 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 2
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c313t-8e352565ebbdc7ad436ea365c081977a9733db9a77f3fd7f8f833d5f3be128533
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2615684299?pq-origsite=%requestingapplication%
PQID 2615684299
PQPubID 2048737
PageCount 17
ParticipantIDs proquest_journals_2615684299
crossref_primary_10_32604_cmc_2022_023167
crossref_citationtrail_10_32604_cmc_2022_023167
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-00-00
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – year: 2022
  text: 2022-00-00
PublicationDecade 2020
PublicationPlace Henderson
PublicationPlace_xml – name: Henderson
PublicationTitle Computers, materials & continua
PublicationYear 2022
Publisher Tech Science Press
Publisher_xml – name: Tech Science Press
References ref15
Mikolov (ref11) 2013
Savas (ref16) 2019; 19
Leopold (ref14) 2002; 46
Choi (ref5) 2020; 14
Zubrinic (ref20) 2013; 7
Mikolov (ref8) 2013
ref2
Dey (ref1) 2019; 65
ref17
Mikolov (ref7) 2013
ref19
Cristianini (ref12) 2000
ref18
Usman (ref9) 2021; 20
Pedregosa (ref21) 2011; 12
ref3
Kim (ref4) 2021; 15
ref6
Lilleberg (ref10) 2015
Cervantes (ref13) 2020; 408
References_xml – volume: 12
  start-page: 2825
  year: 2011
  ident: ref21
  article-title: Scikit-learn: Machine learning in python
  publication-title: Journal of Machine Learning Research
– start-page: 746
  year: 2013
  ident: ref7
  article-title: Linguistic regularities in continuous space word representations
– ident: ref2
– ident: ref3
– ident: ref6
– volume: 65
  start-page: 1173
  year: 2019
  ident: ref1
  article-title: Online piracy and the ‘longer arm’ of enforcement
  publication-title: Management Science
  doi: 10.1287/mnsc.2017.2984
– volume: 14
  start-page: 2204
  year: 2020
  ident: ref5
  article-title: Feature analysis and detection techniques for piracy sites
  publication-title: KSII Transactions on Internet and Information Systems (TII)
– volume: 46
  start-page: 423
  year: 2002
  ident: ref14
  article-title: Text categorization with support vector machines. How to represent texts in input space?
  publication-title: Machine Learning
  doi: 10.1023/A:1012491419635
– start-page: 93
  year: 2000
  ident: ref12
  article-title: Support Vector Machine
  publication-title: An Introduction to Support Vector Machines: and other Kernel-Based Learning Methods
– start-page: 136
  year: 2015
  ident: ref10
  article-title: Support vector machines and word2vec for text classification with semantic features
– volume: 408
  start-page: 189
  year: 2020
  ident: ref13
  article-title: A comprehensive survey on support vector machine classification: Applications, challenges and trends
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2019.10.118
– year: 2013
  ident: ref8
  article-title: Efficient estimation of word representations in vector space
– volume: 7
  start-page: 109
  year: 2013
  ident: ref20
  article-title: Comparison of naive Bayes and SVM classifiers in categorization of concept maps
  publication-title: International Journal of Computers
– volume: 20
  start-page: 1
  year: 2021
  ident: ref9
  article-title: A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models
  publication-title: ACM Transactions on Asian and Low-Resource Language Information Processing
– volume: 19
  start-page: 5219
  year: 2019
  ident: ref16
  article-title: The impact of different kernel functions on the performance of scintillation detection based on support vector machines
  publication-title: Sensors
  doi: 10.3390/s19235219
– volume: 15
  start-page: 285
  year: 2021
  ident: ref4
  article-title: Intelligent piracy site detection technique with high accuracy
  publication-title: KSII Transactions on Internet and Information Systems (TII)
– ident: ref19
– start-page: 3111
  year: 2013
  ident: ref11
  article-title: Distributed representations of words and phrases and their compositionality
  publication-title: Neural Information Processing Systems (NeurIPS)
– ident: ref18
– ident: ref17
– ident: ref15
SSID ssj0036390
Score 2.2411873
Snippet In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of...
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
StartPage 2883
SubjectTerms Algorithms
Gambling
Infringement
Machine learning
Malware
Piracy
Support vector machines
Web sites
Websites
Title Machine Learning-Based Advertisement Banner Identification Technique for Effective Piracy Website Detection Process
URI https://www.proquest.com/docview/2615684299
Volume 71
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwFLSgXVj4RhQK8sDCEJrGju1MiEKrCqlVhVrRLbIdGyFBWtow8O95TpwCC2s-LOXZvnd5tu8QupIRia1QNkiMUAFVodOABDCkhopESMN0KeI6GrPhjD7O47kvuK39tsoaE0ugzhba1cg7wPRjt2aUJLfLj8C5RrnVVW-hsY2aAMFCNFCz1x9PnmosJpB_yyORMWVBBNmsWqgEyhLSjn53EoZRdOMk0Eqf-V-J6S8ul8lmsI92PUvEd1W3HqAtkx-ivdqBAfsJeYTWo3IvpMFeJvUl6EFWyvDGZtnV_nBPOn8tXJ3Jtb5Ih6e1eisG3oorFWOAPjx5XUn9hZ-Ncp-OH0xR7tbKsT9TcIxmg_70fhh4G4VAky4pAmGc5CmLjVKZ5jKjhBlJWKwdG-BcJpyQTCWSc0tsxq2wAi7EligDMQU6eIIa-SI3pwhrS0WWdLVghNNQSMkSlYWE2NBYaLTbQp06hqn2GuPO6uIthX-NMuopRD11UU-rqLfQ9eaNZaWv8c-z7bpbUj_T1unPuDj7__Y52nFtVeWTNmoUq09zAYSiUJd-1HwDE8HKnQ
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NU9swEN1hwoFeWqDtFApFB3rowY1jybZ8YJiGwCTkYzKdMOXmSrLU6QwNaRKGyZ_iN7Jry3xcuHG1LR1W67erlfY9gEMV8dhJ7YLMSh0IHRIHJIKhsEJmUtnElCSuw1HSvRDnl_HlGtzVvTB0rbLGxBKoi2tDNfImZvoxnRll2fHsf0CqUXS6WktoVG7Rt6tb3LItjnodXN-vUXR2OjnpBl5VIDC8xZeBtMQAmsRW68KkqhA8sYonsaHgmKYqSzkvdKbS1HFXpE46iQ9ix7VFLI-pAIqQv47DwqgB6-3T0fhnjf0c433ZghmLJIgwelYHo5gihaJp_hFlYhR9J8q1Utf-SSB8HgfK4Ha2CW99Vsp-VG60BWt2ug3vasUH5gHgPSyG5d1Lyzwt65-gjVGwYA-yzlRrZG1Fel6s6gF2vijIJjVbLMM8mVWsyQi1bPx3rsyK_bKaTM06dlneDpsy38PwAS5excAfoTG9ntpPwIwTsshaRiY8FaFUKsl0EXLuQutw0tYONGsb5sZzmpO0xlWOe5vS6jlaPSer55XVd-Dbw4hZxefxwrd79bLk_s9e5I9-uPvy6wPY6E6Gg3zQG_U_wxuatyrd7EFjOb-x-5jMLPUX70EMfr-2094DvjYHgw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+Learning-Based+Advertisement+Banner+Identification+Technique+for+Effective+Piracy+Website+Detection+Process&rft.jtitle=Computers%2C+materials+%26+continua&rft.au=Adeba+Jilcha%2C+Lelisa&rft.au=Kwak%2C+Jin&rft.date=2022&rft.issn=1546-2226&rft.volume=71&rft.issue=2&rft.spage=2883&rft.epage=2899&rft_id=info:doi/10.32604%2Fcmc.2022.023167&rft.externalDBID=n%2Fa&rft.externalDocID=10_32604_cmc_2022_023167
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1546-2226&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1546-2226&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1546-2226&client=summon