Rainbow: Adaptive Layout Optimization for Wide Tables

Popular column stores such as ORC and Parquet have been widely used in many Hadoop-oriented data analysis systems. With the effective column skipping and data compression functionalities provided by column stores, wide tables with hundreds or even thousands of columns are applied by many big data an...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE 34th International Conference on Data Engineering (ICDE) pp. 1657 - 1660
Main Authors Haoqiong Bian, Youxian Tao, Guodong Jin, Yueguo Chen, Xiongpai Qin, Xiaoyong Du
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Popular column stores such as ORC and Parquet have been widely used in many Hadoop-oriented data analysis systems. With the effective column skipping and data compression functionalities provided by column stores, wide tables with hundreds or even thousands of columns are applied by many big data analysis applications to avoid the expensive distributed joins. We found that the performance of such systems can be further improved by optimizing the physical data layout to fit certain workloads and system settings. However, it is nontrivial to perform such optimization manually. In this demo, we present a data layout optimization tool called Rainbow, which leverages workload-driven layout optimization algorithms to adjust data layouts adaptively without intervening the previous data blocks that have been stored. We also provide a Web UI for users to interact with the layout optimization process. Furthermore, Rainbow is open sourced with an accompanying benchmark for performance evaluation of wide tables.
ISSN:2375-026X
DOI:10.1109/ICDE.2018.00200