Enhancing the DISSFCM Algorithm for Data Stream Classification

Analyzing data streams has become a new challenge to meet the demands of real time analytics. Conventional mining techniques are proving inefficient to cope with challenges associated with data streams, including resources constraints like memory and running time along with single scan of the data....

Full description

Saved in:
Bibliographic Details
Published inFuzzy Logic and Applications Vol. 11291; pp. 109 - 122
Main Authors Casalino, Gabriella, Castellano, Giovanna, Fanelli, Anna Maria, Mencar, Corrado
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2019
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030125431
3030125432
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-12544-8_9

Cover

Loading…
More Information
Summary:Analyzing data streams has become a new challenge to meet the demands of real time analytics. Conventional mining techniques are proving inefficient to cope with challenges associated with data streams, including resources constraints like memory and running time along with single scan of the data. Most existing data stream classification methods require labeled samples that are more difficult and expensive to obtain than unlabeled ones. Semi-supervised learning algorithms can solve this problem by using unlabeled samples together with a few labeled ones to build classification models. Recently we proposed DISSFCM, an algorithm for data stream classification based on incremental semi-supervised fuzzy clustering. To cope with the evolution of data, DISSFCM adapts dynamically the number of clusters by splitting large-scale clusters. While splitting is effective in improving the quality of clusters, a repeated application without counter-balance may induce many small-scale clusters. To solve this problem, in this paper we enhance DISSFCM by introducing a procedure that merges small-scale clusters. Preliminary experimental results on a real-world benchmark dataset show the effectiveness of the method.
ISBN:9783030125431
3030125432
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-12544-8_9