A Framework for Managing Continuous Query Evaluations over Voluminous, Multidimensional Datasets

Efficient access to voluminous multidimensional datasets is essential for several scientific applications, including real-time analysis and visualization. Fast evolving datasets present unique challenges during retrievals. Keeping data up-to-date can be expensive and may involve the following: repea...

Full description

Saved in:
Bibliographic Details
Published in2014 International Conference on Cloud and Autonomic Computing pp. 73 - 82
Main Authors Tolooee, Cameron, Malensek, Matthew, Pallickara, Sangmi Lee
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Efficient access to voluminous multidimensional datasets is essential for several scientific applications, including real-time analysis and visualization. Fast evolving datasets present unique challenges during retrievals. Keeping data up-to-date can be expensive and may involve the following: repeated data queries, excessive data movements, and redundant data preprocessing. This paper focuses on the issue of efficient manipulation of query results in cases where the dataset is continuously evolving. Our approach provides an automated and scalable tracking and caching mechanism to evaluate continuous queries over data stored in a distributed storage system. Among the storage nodes, one or more nodes are selected using an election algorithm based on CPU and memory utilization. These selected nodes ensure that the query output contains the most recent data arrivals and cache the metadata of the query output. This approach is evaluated in the context of Galileo, our distributed data storage framework. Galileo is designed for managing multidimensional time-series datasets generated in geospatial observational settings, e.g. Data generated by remote sensing equipment and sensor networks. We describe our approach of using the metadata graph to push data preprocessing jobs onto the storage system during the continuous query processing and selectively download subsets of the query output. Our performance benchmarks demonstrate the efficacy of our approach.
DOI:10.1109/ICCAC.2014.25