Scalable Algorithms and Associative Statistics

It’s not uncommon that a single computer is inadequate to handle a massively large data set. The common problems are that it takes too long to process the data and the data volume exceeds the storage capacity of the host. Cleverly designed algorithms sometimes can reduce the processing time to an ac...

Full description

Saved in:
Bibliographic Details
Published inAlgorithms for Data Science pp. 51 - 104
Main Authors Steele, Brian, Chandler, John, Reddy, Swarna
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing 2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It’s not uncommon that a single computer is inadequate to handle a massively large data set. The common problems are that it takes too long to process the data and the data volume exceeds the storage capacity of the host. Cleverly designed algorithms sometimes can reduce the processing time to an acceptable point, but the single host solution will eventually fail if data volume is sufficiently great. A far-reaching solution to the data volume problem replaces the single host with a network of computers across which the data are distributed and processed. However, the hardware solution is incomplete until the data processing algorithms are adapted to the distributed computing environment. A complete solution requires algorithms that are scalable. Scalability depends on the statistics that are being computed by the algorithm, and the statistics that allow for scalability are associative statistics. Scalability and associative statistics are the subject of this chapter.
ISBN:3319457950
9783319457956
DOI:10.1007/978-3-319-45797-0_3