Machine Learning-Based Advertisement Banner Identification Technique for Effective Piracy Website Detection Process
In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking th...
Saved in:
Published in | Computers, materials & continua Vol. 71; no. 2; pp. 2883 - 2899 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Henderson
Tech Science Press
2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking those websites, particularly through affiliated government bodies. To do so, an effective detection mechanism is a necessary first step. Some researchers have used various approaches to analyze the possible common features of suspected piracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, these advertisements have some common attributes that make them unique as compared to advertisements posted on normal or legitimate websites. They usually encompass keywords such as click-words (words that redirect to install malicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key features in the process of successfully detecting websites involved in the act of copyright infringement. Research has been conducted to identify advertisements served on suspected piracy websites. However, these studies use a static approach that relies mainly on manual scanning for the aforementioned keywords. This brings with it some limitations, particularly in coping with the dynamic and ever-changing behavior of advertisements posted on these websites. Therefore, we propose a technique that can continuously fine-tune itself and is intelligent enough to effectively identify advertisement (Ad) banners extracted from suspected piracy websites. We have done this by leveraging the power of machine learning algorithms, particularly the support vector machine with the word2vec word-embedding model. After applying the proposed technique to 1015 Ad banners collected from 98 suspected piracy websites and 90 normal or legitimate websites, we were able to successfully identify Ad banners extracted from suspected piracy websites with an accuracy of 97%. We present this technique with the hope that it will be a useful tool for various effective piracy website detection approaches. To our knowledge, this is the first approach that uses machine learning to identify Ad banners served on suspected piracy websites. |
---|---|
AbstractList | In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of dollars are lost annually because of this illegal act. The current most effective trend to tackle this problem is believed to be blocking those websites, particularly through affiliated government bodies. To do so, an effective detection mechanism is a necessary first step. Some researchers have used various approaches to analyze the possible common features of suspected piracy websites. For instance, most of these websites serve online advertisement, which is considered as their main source of revenue. In addition, these advertisements have some common attributes that make them unique as compared to advertisements posted on normal or legitimate websites. They usually encompass keywords such as click-words (words that redirect to install malicious software) and frequently used words in illegal gambling, illegal sexual acts, and so on. This makes them ideal to be used as one of the key features in the process of successfully detecting websites involved in the act of copyright infringement. Research has been conducted to identify advertisements served on suspected piracy websites. However, these studies use a static approach that relies mainly on manual scanning for the aforementioned keywords. This brings with it some limitations, particularly in coping with the dynamic and ever-changing behavior of advertisements posted on these websites. Therefore, we propose a technique that can continuously fine-tune itself and is intelligent enough to effectively identify advertisement (Ad) banners extracted from suspected piracy websites. We have done this by leveraging the power of machine learning algorithms, particularly the support vector machine with the word2vec word-embedding model. After applying the proposed technique to 1015 Ad banners collected from 98 suspected piracy websites and 90 normal or legitimate websites, we were able to successfully identify Ad banners extracted from suspected piracy websites with an accuracy of 97%. We present this technique with the hope that it will be a useful tool for various effective piracy website detection approaches. To our knowledge, this is the first approach that uses machine learning to identify Ad banners served on suspected piracy websites. |
Author | Adeba Jilcha, Lelisa Kwak, Jin |
Author_xml | – sequence: 1 givenname: Lelisa surname: Adeba Jilcha fullname: Adeba Jilcha, Lelisa – sequence: 2 givenname: Jin surname: Kwak fullname: Kwak, Jin |
BookMark | eNp1kE1PAjEQhhuDiYjePTbxvNhtabt7BEQlwcgB43HT7U6lBLrYFhL-vQU8GBNP8_W-M5PnGnVc6wChu5z0GRVk8KA3uk8JpX1CWS7kBermfCAySqno_Mqv0HUIK0KYYCXpovCq9NI6wDNQ3ln3mY1UgAYPmz34aANswEU8Us6Bx9MmFdZYraJtHV6AXjr7tQNsWo8nxoCOdg94br3SB_wBdbAR8CPE4yAZ5r7VEMINujRqHeD2J_bQ-9NkMX7JZm_P0_FwlmmWs5gVwDjlgkNdN1qqZsAEKCa4JkVeSqlKyVhTl0pKw0wjTWGK1OCG1ZDTgjPWQ_fnvVvfpi9DrFbtzrt0sqIi56IY0LJMKnJWad-G4MFUW283yh-qnFQntFVCWx3RVme0ySL-WLSNJybRK7v-3_gNCH2BVg |
CitedBy_id | crossref_primary_10_1080_14459795_2023_2190372 crossref_primary_10_32604_csse_2023_037615 |
Cites_doi | 10.1287/mnsc.2017.2984 10.1023/A:1012491419635 10.1016/j.neucom.2019.10.118 10.3390/s19235219 |
ContentType | Journal Article |
Copyright | 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 7SC 7SR 8BQ 8FD ABUWG AFKRA AZQEC BENPR CCPQU DWQXO JG9 JQ2 L7M L~C L~D PHGZM PHGZT PIMPY PKEHL PQEST PQQKQ PQUKI PRINS |
DOI | 10.32604/cmc.2022.023167 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Engineered Materials Abstracts METADEX Technology Research Database ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One Community College ProQuest Central Korea Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China |
DatabaseTitle | CrossRef Publicly Available Content Database Materials Research Database Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest Central China METADEX Computer and Information Systems Abstracts Professional ProQuest Central Engineered Materials Abstracts ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic Advanced Technologies Database with Aerospace ProQuest One Academic (New) |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1546-2226 |
EndPage | 2899 |
ExternalDocumentID | 10_32604_cmc_2022_023167 |
GroupedDBID | AAFWJ AAYXX ACIWK ADMLS AFKRA ALMA_UNASSIGNED_HOLDINGS BENPR CCPQU CITATION EBS EJD J9A OK1 P2P PHGZM PHGZT PIMPY RTS TUS 7SC 7SR 8BQ 8FD ABUWG AZQEC DWQXO JG9 JQ2 L7M L~C L~D PKEHL PQEST PQQKQ PQUKI PRINS |
ID | FETCH-LOGICAL-c313t-8e352565ebbdc7ad436ea365c081977a9733db9a77f3fd7f8f833d5f3be128533 |
IEDL.DBID | BENPR |
ISSN | 1546-2226 1546-2218 |
IngestDate | Sun Jun 29 12:46:26 EDT 2025 Tue Jul 01 01:57:05 EDT 2025 Thu Apr 24 22:58:18 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 2 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c313t-8e352565ebbdc7ad436ea365c081977a9733db9a77f3fd7f8f833d5f3be128533 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
OpenAccessLink | https://www.proquest.com/docview/2615684299?pq-origsite=%requestingapplication% |
PQID | 2615684299 |
PQPubID | 2048737 |
PageCount | 17 |
ParticipantIDs | proquest_journals_2615684299 crossref_primary_10_32604_cmc_2022_023167 crossref_citationtrail_10_32604_cmc_2022_023167 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-00-00 20220101 |
PublicationDateYYYYMMDD | 2022-01-01 |
PublicationDate_xml | – year: 2022 text: 2022-00-00 |
PublicationDecade | 2020 |
PublicationPlace | Henderson |
PublicationPlace_xml | – name: Henderson |
PublicationTitle | Computers, materials & continua |
PublicationYear | 2022 |
Publisher | Tech Science Press |
Publisher_xml | – name: Tech Science Press |
References | ref15 Mikolov (ref11) 2013 Savas (ref16) 2019; 19 Leopold (ref14) 2002; 46 Choi (ref5) 2020; 14 Zubrinic (ref20) 2013; 7 Mikolov (ref8) 2013 ref2 Dey (ref1) 2019; 65 ref17 Mikolov (ref7) 2013 ref19 Cristianini (ref12) 2000 ref18 Usman (ref9) 2021; 20 Pedregosa (ref21) 2011; 12 ref3 Kim (ref4) 2021; 15 ref6 Lilleberg (ref10) 2015 Cervantes (ref13) 2020; 408 |
References_xml | – volume: 12 start-page: 2825 year: 2011 ident: ref21 article-title: Scikit-learn: Machine learning in python publication-title: Journal of Machine Learning Research – start-page: 746 year: 2013 ident: ref7 article-title: Linguistic regularities in continuous space word representations – ident: ref2 – ident: ref3 – ident: ref6 – volume: 65 start-page: 1173 year: 2019 ident: ref1 article-title: Online piracy and the ‘longer arm’ of enforcement publication-title: Management Science doi: 10.1287/mnsc.2017.2984 – volume: 14 start-page: 2204 year: 2020 ident: ref5 article-title: Feature analysis and detection techniques for piracy sites publication-title: KSII Transactions on Internet and Information Systems (TII) – volume: 46 start-page: 423 year: 2002 ident: ref14 article-title: Text categorization with support vector machines. How to represent texts in input space? publication-title: Machine Learning doi: 10.1023/A:1012491419635 – start-page: 93 year: 2000 ident: ref12 article-title: Support Vector Machine publication-title: An Introduction to Support Vector Machines: and other Kernel-Based Learning Methods – start-page: 136 year: 2015 ident: ref10 article-title: Support vector machines and word2vec for text classification with semantic features – volume: 408 start-page: 189 year: 2020 ident: ref13 article-title: A comprehensive survey on support vector machine classification: Applications, challenges and trends publication-title: Neurocomputing doi: 10.1016/j.neucom.2019.10.118 – year: 2013 ident: ref8 article-title: Efficient estimation of word representations in vector space – volume: 7 start-page: 109 year: 2013 ident: ref20 article-title: Comparison of naive Bayes and SVM classifiers in categorization of concept maps publication-title: International Journal of Computers – volume: 20 start-page: 1 year: 2021 ident: ref9 article-title: A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models publication-title: ACM Transactions on Asian and Low-Resource Language Information Processing – volume: 19 start-page: 5219 year: 2019 ident: ref16 article-title: The impact of different kernel functions on the performance of scintillation detection based on support vector machines publication-title: Sensors doi: 10.3390/s19235219 – volume: 15 start-page: 285 year: 2021 ident: ref4 article-title: Intelligent piracy site detection technique with high accuracy publication-title: KSII Transactions on Internet and Information Systems (TII) – ident: ref19 – start-page: 3111 year: 2013 ident: ref11 article-title: Distributed representations of words and phrases and their compositionality publication-title: Neural Information Processing Systems (NeurIPS) – ident: ref18 – ident: ref17 – ident: ref15 |
SSID | ssj0036390 |
Score | 2.2411873 |
Snippet | In the contemporary world, digital content that is subject to copyright is facing significant challenges against the act of copyright infringement. Billions of... |
SourceID | proquest crossref |
SourceType | Aggregation Database Enrichment Source Index Database |
StartPage | 2883 |
SubjectTerms | Algorithms Gambling Infringement Machine learning Malware Piracy Support vector machines Web sites Websites |
Title | Machine Learning-Based Advertisement Banner Identification Technique for Effective Piracy Website Detection Process |
URI | https://www.proquest.com/docview/2615684299 |
Volume | 71 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwFLSgXVj4RhQK8sDCEJrGju1MiEKrCqlVhVrRLbIdGyFBWtow8O95TpwCC2s-LOXZvnd5tu8QupIRia1QNkiMUAFVodOABDCkhopESMN0KeI6GrPhjD7O47kvuK39tsoaE0ugzhba1cg7wPRjt2aUJLfLj8C5RrnVVW-hsY2aAMFCNFCz1x9PnmosJpB_yyORMWVBBNmsWqgEyhLSjn53EoZRdOMk0Eqf-V-J6S8ul8lmsI92PUvEd1W3HqAtkx-ivdqBAfsJeYTWo3IvpMFeJvUl6EFWyvDGZtnV_nBPOn8tXJ3Jtb5Ih6e1eisG3oorFWOAPjx5XUn9hZ-Ncp-OH0xR7tbKsT9TcIxmg_70fhh4G4VAky4pAmGc5CmLjVKZ5jKjhBlJWKwdG-BcJpyQTCWSc0tsxq2wAi7EligDMQU6eIIa-SI3pwhrS0WWdLVghNNQSMkSlYWE2NBYaLTbQp06hqn2GuPO6uIthX-NMuopRD11UU-rqLfQ9eaNZaWv8c-z7bpbUj_T1unPuDj7__Y52nFtVeWTNmoUq09zAYSiUJd-1HwDE8HKnQ |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NU9swEN1hwoFeWqDtFApFB3rowY1jybZ8YJiGwCTkYzKdMOXmSrLU6QwNaRKGyZ_iN7Jry3xcuHG1LR1W67erlfY9gEMV8dhJ7YLMSh0IHRIHJIKhsEJmUtnElCSuw1HSvRDnl_HlGtzVvTB0rbLGxBKoi2tDNfImZvoxnRll2fHsf0CqUXS6WktoVG7Rt6tb3LItjnodXN-vUXR2OjnpBl5VIDC8xZeBtMQAmsRW68KkqhA8sYonsaHgmKYqSzkvdKbS1HFXpE46iQ9ix7VFLI-pAIqQv47DwqgB6-3T0fhnjf0c433ZghmLJIgwelYHo5gihaJp_hFlYhR9J8q1Utf-SSB8HgfK4Ha2CW99Vsp-VG60BWt2ug3vasUH5gHgPSyG5d1Lyzwt65-gjVGwYA-yzlRrZG1Fel6s6gF2vijIJjVbLMM8mVWsyQi1bPx3rsyK_bKaTM06dlneDpsy38PwAS5excAfoTG9ntpPwIwTsshaRiY8FaFUKsl0EXLuQutw0tYONGsb5sZzmpO0xlWOe5vS6jlaPSer55XVd-Dbw4hZxefxwrd79bLk_s9e5I9-uPvy6wPY6E6Gg3zQG_U_wxuatyrd7EFjOb-x-5jMLPUX70EMfr-2094DvjYHgw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+Learning-Based+Advertisement+Banner+Identification+Technique+for+Effective+Piracy+Website+Detection+Process&rft.jtitle=Computers%2C+materials+%26+continua&rft.au=Adeba+Jilcha%2C+Lelisa&rft.au=Kwak%2C+Jin&rft.date=2022&rft.issn=1546-2226&rft.volume=71&rft.issue=2&rft.spage=2883&rft.epage=2899&rft_id=info:doi/10.32604%2Fcmc.2022.023167&rft.externalDBID=n%2Fa&rft.externalDocID=10_32604_cmc_2022_023167 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1546-2226&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1546-2226&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1546-2226&client=summon |