A new evolving clustering algorithm for online data streams

In this paper, we propose a new approach to fuzzy data clustering. We present a new algorithm, called TEDA-Cloud, based on the recently introduced TEDA approach to outlier detection. TEDA-Cloud is a statistical method based on the concepts of typicality and eccentricity able to group similar data ob...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS) pp. 162 - 168
Main Authors	Bezerra, Clauber Gomes, Sielly Jales Costa, Bruno, Guedes, Luiz Affonso, Angelov, Plamen Parvanov
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2016
Subjects	autonomous learning clustering Clustering algorithms Conferences data streams eccentricity Electronic mail evolving systems Mathematical model real-time Real-time systems Shape TEDA Training typicality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we propose a new approach to fuzzy data clustering. We present a new algorithm, called TEDA-Cloud, based on the recently introduced TEDA approach to outlier detection. TEDA-Cloud is a statistical method based on the concepts of typicality and eccentricity able to group similar data observations. Instead of the traditional concept of clusters, the data is grouped in the form of granular unities called data clouds, which are structures with no pre-defined shape or set boundaries. TEDA-Cloud is a fully autonomous and self-evolving algorithm that can be used for data clustering of online data streams and applications that require real-time response. Since it is fully autonomous, TEDA-Cloud is able to "start from scratch" (from an empty knowledge basis), create, update and merge data clouds, in a fully autonomous manner, without requiring any user-defined parameters (e.g. number of clusters, size, radius) or previous training. Moreover, TEDA-Cloud, unlike most of the traditional statistical approaches, does not rely on a specific data distribution or on the assumption of independence of data samples. The results, obtained from multiple data sets that are very well known in literature, are very encouraging.
DOI:	10.1109/EAIS.2016.7502508