Preserving the Value of Large Scale Data Analytics over Time Through Selective Re-computation

A pervasive problem in Data Science is that the knowledge generated by possibly expensive analytics processes is subject to decay over time as the data and algorithms used to compute it change, and the external knowledge embodied by reference datasets evolves. Deciding when such knowledge outcomes s...

Full description

Saved in:

Bibliographic Details
Published in	Data Analytics pp. 65 - 77
Main Authors	Missier, Paolo, Cała, Jacek, Rathi, Manisha
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Incremental computation Metadata management Partial re-computation Provenance Selective re-computation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A pervasive problem in Data Science is that the knowledge generated by possibly expensive analytics processes is subject to decay over time as the data and algorithms used to compute it change, and the external knowledge embodied by reference datasets evolves. Deciding when such knowledge outcomes should be refreshed, following a sequence of data change events, requires problem-specific functions to quantify their value and its decay over time, as well as models for estimating the cost of their re-computation. Challenging is the ambition to develop a decision support system for informing re-computation decisions over time that is both generic and customisable. With the help of a case study from genomics, in this paper we offer an initial formalisation of this problem, highlight research challenges, and outline a possible approach based on the analysis of metadata from a history of past computations.
ISBN:	9783319607948 3319607944
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-60795-5_6