Scalable Algorithms and Associative Statistics
It’s not uncommon that a single computer is inadequate to handle a massively large data set. The common problems are that it takes too long to process the data and the data volume exceeds the storage capacity of the host. Cleverly designed algorithms sometimes can reduce the processing time to an ac...
Saved in:
Published in | Algorithms for Data Science pp. 51 - 104 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | It’s not uncommon that a single computer is inadequate to handle a massively large data set. The common problems are that it takes too long to process the data and the data volume exceeds the storage capacity of the host. Cleverly designed algorithms sometimes can reduce the processing time to an acceptable point, but the single host solution will eventually fail if data volume is sufficiently great. A far-reaching solution to the data volume problem replaces the single host with a network of computers across which the data are distributed and processed. However, the hardware solution is incomplete until the data processing algorithms are adapted to the distributed computing environment. A complete solution requires algorithms that are scalable. Scalability depends on the statistics that are being computed by the algorithm, and the statistics that allow for scalability are associative statistics. Scalability and associative statistics are the subject of this chapter. |
---|---|
ISBN: | 3319457950 9783319457956 |
DOI: | 10.1007/978-3-319-45797-0_3 |