Approximate Aggregations in Structured P2P Networks

In corporate networks, daily business data are generated in gigabytes or even terabytes. It is costly to process aggregate queries in those systems. In this paper, we propose PACA, a probably approximately correct aggregate query processing scheme, for answering aggregate queries in structured Peer-...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 23; no. 11; pp. 1748 - 1752
Main Authors	Sun, Dalie, Wu, Sai, Jiang, Shouxu, Li, Jianzhong
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.11.2011 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Agglomeration Aggregates Applied sciences approximate query processing Approximation BATON Business Computer science; control theory; systems Computer systems and distributed systems. User interface Estimation Exact sciences and technology Indexes Information systems. Data bases Memory organisation. Data processing Networks Peer to peer computing Peer-to-Peer Queries Query processing Random numbers Servers Software Studies High performance Peer to peer approximate query processing Probabilistic approach Database query Distributed system Peer-to-Peer Distributed computing Random number Aggregation Query processing BATON Database Random access
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In corporate networks, daily business data are generated in gigabytes or even terabytes. It is costly to process aggregate queries in those systems. In this paper, we propose PACA, a probably approximately correct aggregate query processing scheme, for answering aggregate queries in structured Peer-to-Peer (P2P) network. PACA retrieves random samples from peers' databases and applies the samples to process queries. Instead of scanning the entire database of each peer, PACA only accesses a small random number of data. Moreover, based on the query distribution,PACA publishes a precomputed synopsis and uses the synopsis to answer future queries. Most queries are expected to be answered by the precomputed synopsis partially or fully. And the synopsis is adaptively tuned to follow the query distribution. Experiments on the PlanetLab show the effectiveness of the approach.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2010.198