Scalable Algorithms and Associative Statistics

It’s not uncommon that a single computer is inadequate to handle a massively large data set. The common problems are that it takes too long to process the data and the data volume exceeds the storage capacity of the host. Cleverly designed algorithms sometimes can reduce the processing time to an ac...

Full description

Saved in:

Bibliographic Details
Published in	Algorithms for Data Science pp. 51 - 104
Main Authors	Steele, Brian, Chandler, John, Reddy, Swarna
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing 2016
Subjects	Associative Statistic Behavioral Risk Factor Surveillance System Body Mass Index Linear Regression Model Python Script
Online Access	Get full text

Cover

Loading…

More Information
Summary:	It’s not uncommon that a single computer is inadequate to handle a massively large data set. The common problems are that it takes too long to process the data and the data volume exceeds the storage capacity of the host. Cleverly designed algorithms sometimes can reduce the processing time to an acceptable point, but the single host solution will eventually fail if data volume is sufficiently great. A far-reaching solution to the data volume problem replaces the single host with a network of computers across which the data are distributed and processed. However, the hardware solution is incomplete until the data processing algorithms are adapted to the distributed computing environment. A complete solution requires algorithms that are scalable. Scalability depends on the statistics that are being computed by the algorithm, and the statistics that allow for scalability are associative statistics. Scalability and associative statistics are the subject of this chapter.
ISBN:	3319457950 9783319457956
DOI:	10.1007/978-3-319-45797-0_3