Comparative studies of sampling for analytics on massive data

Groupwise analytics on big data have been widely used in statistics, computer science, parallel computing and many other fields in recent years. At The same time, Aggregation queries is one of the most important analytics techniques. In big data eras, the aggregation queries on the ever-increasing d...

Full description

Saved in:
Bibliographic Details
Published in2016 3rd International Conference on Systems and Informatics (ICSAI) pp. 1002 - 1007
Main Authors Xuan Zhang, Dongsheng Li
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Groupwise analytics on big data have been widely used in statistics, computer science, parallel computing and many other fields in recent years. At The same time, Aggregation queries is one of the most important analytics techniques. In big data eras, the aggregation queries on the ever-increasing data volumes will consumes much time, the traditional methods of traversing the entire dataset is not acceptable to users. Data sampling is a technique that only process a part of data to get an approximate result, the technique can save a lot of time when dealing with a vast amount of data with the sacrifice of accuracy. This paper will introduce several data sampling algorithms for approximate aggregation queries for big data, and analyze the shortcomings and advantages of each methods. Including the technique apply to the sparse data which meaning data has a limited population but a wide range.
DOI:10.1109/ICSAI.2016.7811097