Overcoming limitations of sampling for aggregation queries

Studies the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To address this issue, we introduce a technique called outlier indexing. Uniform sampling is also ineffecti...

Full description

Saved in:
Bibliographic Details
Published inProceedings 17th International Conference on Data Engineering pp. 534 - 542
Main Authors Chaudhuri, S., Das, G., Datar, M., Motwani, R., Narasayya, V.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2001
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Studies the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To address this issue, we introduce a technique called outlier indexing. Uniform sampling is also ineffective for queries with low selectivity. We rely on weighted sampling based on workload information to overcome this shortcoming. We demonstrate that a combination of outlier indexing with weighted sampling can be used to answer aggregation queries with a significantly reduced approximation error compared to either uniform sampling or weighted sampling alone. We discuss the implementation of these techniques on Microsoft's SQL Server and present experimental results that demonstrate the merits of our techniques.
ISBN:0769510019
9780769510019
ISSN:1063-6382
2375-026X
DOI:10.1109/ICDE.2001.914867