Recurring concept detection for spam filtering

In this work we dig into the problem of recurring concept drifts, proposing a framework to manage them. Its implementation and evaluation phases have been oriented to solve the spam detection problem, taking into account that it is a real-world situation where concepts (spam patterns) may reappear....

Full description

Saved in:
Bibliographic Details
Published in17th International Conference on Information Fusion (FUSION) pp. 1 - 7
Main Authors Abad, Miguel Angel, Gomes, Joao Bartolo, Menasalvas, Ernestina
Format Conference Proceeding
LanguageEnglish
Published International Society of Information Fusion 01.07.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this work we dig into the problem of recurring concept drifts, proposing a framework to manage them. Its implementation and evaluation phases have been oriented to solve the spam detection problem, taking into account that it is a real-world situation where concepts (spam patterns) may reappear. The possibility of detecting recurring drifts allows to reuse previously learnt models, enhancing the overall learning process specifically in terms of accuracy and efficiency. Consequently, in this paper we propose the Meta-Model Drift Detector (MM-DD). The proposed system is able to deal with the underlying context that results from the drifts detected throughout the data stream learning process. In order to do so, a meta-model is trained in parallel to the learning process. While the learning process of the base classifier is feeding the meta-model with all the context information when a drift occurs, the later is able to predict in the near future recurrent situations. Therefore, when a drift is detected the meta-model checks if the context information is equal to any of the previously managed by the learning process and provides the most suitable stored model to deal with the concept. Our experimental results support the value of the proposed MM-DD in terms of accuracy when compared with existing approaches.