Comparative studies of sampling for analytics on massive data
Groupwise analytics on big data have been widely used in statistics, computer science, parallel computing and many other fields in recent years. At The same time, Aggregation queries is one of the most important analytics techniques. In big data eras, the aggregation queries on the ever-increasing d...
Saved in:
Published in | 2016 3rd International Conference on Systems and Informatics (ICSAI) pp. 1002 - 1007 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.11.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Groupwise analytics on big data have been widely used in statistics, computer science, parallel computing and many other fields in recent years. At The same time, Aggregation queries is one of the most important analytics techniques. In big data eras, the aggregation queries on the ever-increasing data volumes will consumes much time, the traditional methods of traversing the entire dataset is not acceptable to users. Data sampling is a technique that only process a part of data to get an approximate result, the technique can save a lot of time when dealing with a vast amount of data with the sacrifice of accuracy. This paper will introduce several data sampling algorithms for approximate aggregation queries for big data, and analyze the shortcomings and advantages of each methods. Including the technique apply to the sparse data which meaning data has a limited population but a wide range. |
---|---|
DOI: | 10.1109/ICSAI.2016.7811097 |