Rule‐based preprocessing for data stream mining using complex event processing
Data preprocessing is known to be essential to produce accurate data from which mining methods are able to extract valuable knowledge. When data constantly arrives from one or more sources, preprocessing techniques need to be adapted to efficiently handle these data streams. To help domain experts t...
Saved in:
Published in | Expert systems Vol. 38; no. 8 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Oxford
Blackwell Publishing Ltd
01.12.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Data preprocessing is known to be essential to produce accurate data from which mining methods are able to extract valuable knowledge. When data constantly arrives from one or more sources, preprocessing techniques need to be adapted to efficiently handle these data streams. To help domain experts to define and execute preprocessing tasks for data streams, this paper proposes the use of active rule‐based systems and, more specifically, complex event processing (CEP) languages and engines. The main contribution of our approach is the formulation of preprocessing procedures as event detection rules, expressed in an SQL‐like language, that provide domain experts a simple way to manipulate temporal data. This idea is materialized into a publicly available solution that integrates a CEP engine with a library for online data mining. To evaluate our approach, we present three practical scenarios in which CEP rules preprocess data streams with the aim of adding temporal information, transforming features and handling missing values. Experiments show how CEP rules provide an effective language to express preprocessing tasks in a modular and high‐level manner, without significant time and memory overheads. The resulting data streams do not only help improving the predictive accuracy of classification algorithms, but also allow reducing the complexity of the decision models and the time needed for learning in some cases. |
---|---|
Bibliography: | Funding information Andalusian Regional Government, Grant/Award Number: DOC_00944; Spanish Government under project COSCA, Grant/Award Number: PGC2018‐094905‐B‐I00; European Commission (FEDER); University of Córdoba. ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0266-4720 1468-0394 |
DOI: | 10.1111/exsy.12762 |