Rule‐based preprocessing for data stream mining using complex event processing

Data preprocessing is known to be essential to produce accurate data from which mining methods are able to extract valuable knowledge. When data constantly arrives from one or more sources, preprocessing techniques need to be adapted to efficiently handle these data streams. To help domain experts t...

Full description

Saved in:
Bibliographic Details
Published inExpert systems Vol. 38; no. 8
Main Authors Ramírez, Aurora, Moreno, Nathalie, Vallecillo, Antonio
Format Journal Article
LanguageEnglish
Published Oxford Blackwell Publishing Ltd 01.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data preprocessing is known to be essential to produce accurate data from which mining methods are able to extract valuable knowledge. When data constantly arrives from one or more sources, preprocessing techniques need to be adapted to efficiently handle these data streams. To help domain experts to define and execute preprocessing tasks for data streams, this paper proposes the use of active rule‐based systems and, more specifically, complex event processing (CEP) languages and engines. The main contribution of our approach is the formulation of preprocessing procedures as event detection rules, expressed in an SQL‐like language, that provide domain experts a simple way to manipulate temporal data. This idea is materialized into a publicly available solution that integrates a CEP engine with a library for online data mining. To evaluate our approach, we present three practical scenarios in which CEP rules preprocess data streams with the aim of adding temporal information, transforming features and handling missing values. Experiments show how CEP rules provide an effective language to express preprocessing tasks in a modular and high‐level manner, without significant time and memory overheads. The resulting data streams do not only help improving the predictive accuracy of classification algorithms, but also allow reducing the complexity of the decision models and the time needed for learning in some cases.
Bibliography:Funding information
Andalusian Regional Government, Grant/Award Number: DOC_00944; Spanish Government under project COSCA, Grant/Award Number: PGC2018‐094905‐B‐I00; European Commission (FEDER); University of Córdoba.
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0266-4720
1468-0394
DOI:10.1111/exsy.12762