Aggregation of imprecise and uncertain information in databases

Information stored in a database is often subject to uncertainty and imprecision. Probability theory provides a well-known and well understood way of representing uncertainty and may thus be used to provide a mechanism for storing uncertain information in a database. We consider the problem of aggre...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 13; no. 6; pp. 902 - 912
Main Authors	McClean, S., Scotney, B., Shapcott, M.
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2001 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Agglomeration Algebra Data models Database systems Deductive databases Divergence Information retrieval Operators Probability Probability distribution Probability theory Queries Query processing Relational databases Stochastic processes Studies Uncertainty
Online Access	Get full text
ISSN	1041-4347 1558-2191
DOI	10.1109/69.971186

Cover

Loading…

More Information
Summary:	Information stored in a database is often subject to uncertainty and imprecision. Probability theory provides a well-known and well understood way of representing uncertainty and may thus be used to provide a mechanism for storing uncertain information in a database. We consider the problem of aggregation using an imprecise probability data model that allows us to represent imprecision by partial probabilities and uncertainty using probability distributions. Most work to date has concentrated on providing functionality for extending the relational algebra with a view to executing traditional queries on uncertain or imprecise data. However, for imprecise and uncertain data, we often require aggregation operators that provide information on patterns in the data. Thus, while traditional query processing is tuple-driven, processing of uncertain data is often attribute-driven where we use aggregation operators to discover attribute properties. The aggregation operator that we define uses the Kullback-Leibler information divergence between the aggregated probability distribution and the individual tuple values to provide a probability distribution for the domain values of an attribute or group of attributes. The provision of such aggregation operators is a central requirement in furnishing a database with the capability to perform the operations necessary for knowledge discovery in databases.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1041-4347 1558-2191
DOI:	10.1109/69.971186