Piracema: a Phishing snapshot database for building dataset features

Phishing is an attack characterized by attempted fraud against users. The attacker develops a malicious page that is a trusted environment, inducing its victims to submit sensitive data. There are several platforms, such as PhishTank and OpenPhish, that maintain databases on malicious pages to suppo...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 12; no. 1; p. 15149
Main Authors Gomes de Barros, Julio Cesar, Revoredo da Silva, Carlo Marcelo, Candeia Teixeira, Lucas, Torres Fernandes, Bruno José, Lorenzato de Oliveira, Joao Fausto, Luzeiro Feitosa, Eduardo, Pinheiro dos Santos, Wellington, Ferraz Arcoverde, Henrique, Cardoso Garcia, Vinicius
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 07.09.2022
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Phishing is an attack characterized by attempted fraud against users. The attacker develops a malicious page that is a trusted environment, inducing its victims to submit sensitive data. There are several platforms, such as PhishTank and OpenPhish, that maintain databases on malicious pages to support anti-phishing solutions, such as, for example, block lists and machine learning. A problem with this scenario is that many of these databases are disorganized, inconsistent, and have some limitations regarding integrity and balance. In addition, because phishing is so volatile, considerable effort is put into preserving temporal information from each malicious page. To contribute, this article built a phishing database with consistent and balanced data, temporal information, and a significant number of occurrences, totaling 942,471 records over the 5 years between 2016 and 2021. Of these records, 135,542 preserve the page’s source code, 258,416 have the attack target brand detected, 70,597 have the hosting service identified, and 15,008 have the shortener service discovered. Additionally, 123,285 records store WHOIS information of the domain registered in 2021. The data is available on the website https://piracema.io/repository.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-022-19442-8