A Framework for Clustering Uncertain Data Streams

In recent years, uncertain data management applications have grown in importance because of the large number of hardware applications which measure data approximately. For example, sensors are typically expected to have considerable noise in their readings because of inaccuracies in data retrieval,...

Full description

Saved in:
Bibliographic Details
Published in2008 IEEE 24th International Conference on Data Engineering pp. 150 - 159
Main Authors Aggarwal, C.C., Yu, P.S.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2008
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years, uncertain data management applications have grown in importance because of the large number of hardware applications which measure data approximately. For example, sensors are typically expected to have considerable noise in their readings because of inaccuracies in data retrieval, transmission, and power failures. In many cases, the estimated error of the underlying data stream is available. This information is very useful for the mining process, since it can be used in order to improve the quality of the underlying results. In this paper we will propose a method for clustering uncertain data streams. We use a very general model of the uncertainty in which we assume that only a few statistical measures of the uncertainty are available. We will show that the use of even modest uncertainty information during the mining process is sufficient to greatly improve the quality of the underlying results. We show that our approach is more effective than a purely deterministic method such as the CluStream approach. We will test the approach on a variety of real and synthetic data sets and illustrate the advantages of the method in terms of effectiveness and efficiency.
ISBN:9781424418367
1424418364
ISSN:1063-6382
2375-026X
DOI:10.1109/ICDE.2008.4497423