Concept Drift Detection in Phishing Using Autoencoders
When machine learning models are built with non-stationary data their performance will naturally decrease over time due to concept drift, shifts in the underlying distribution of the data. A common solution is to retrain the machine learning model which can be expensive, both in obtaining new labele...
Saved in:
Published in | Machine Learning and Metaheuristics Algorithms, and Applications Vol. 1366; pp. 208 - 220 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Singapore
Springer
2021
Springer Singapore |
Series | Communications in Computer and Information Science |
Subjects | |
Online Access | Get full text |
ISBN | 9811604185 9789811604188 |
ISSN | 1865-0929 1865-0937 |
DOI | 10.1007/978-981-16-0419-5_17 |
Cover
Loading…
Summary: | When machine learning models are built with non-stationary data their performance will naturally decrease over time due to concept drift, shifts in the underlying distribution of the data. A common solution is to retrain the machine learning model which can be expensive, both in obtaining new labeled data and in compute time. Traditionally many approaches to concept drift detection operate upon streaming data. However drift is also prevalent in semi-stationary data such as web data, social media, and any data set which is generated from human behaviors. Changing web technology causes concept drift in the website data that is used by phishing detection models. In this work, we create “Autoencoder Drift Detection” (ADD) an unsupervised approach for a drift detection mechanism that is suitable for semi-stationary data. We use the reconstruction error of the autoencoder as a proxy to detect concept drift. We use ADD to detect drift in a phishing detection data set which contains drift as it was collected over one year. We also show that ADD is competitive within ±24% with popular streaming drift detection algorithms on benchmark drift datasets. The average accuracy on the phishing data set is .473 without drift detection and using ADD is increased to .648. |
---|---|
ISBN: | 9811604185 9789811604188 |
ISSN: | 1865-0929 1865-0937 |
DOI: | 10.1007/978-981-16-0419-5_17 |