Offset-value coding in database query processing

Recent work shows how offset-value coding speeds up database query execution, not only sorting but also duplicate removal and grouping (aggregation) in sorted streams, order-preserving exchange (shuffle), merge join, and more. It already saves thousands of CPUs in Google's Napa and F1 Query sys...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Graefe, Goetz, Do, Thanh
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 17.02.2023
Subjects	Algorithms Coding Queries Query processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent work shows how offset-value coding speeds up database query execution, not only sorting but also duplicate removal and grouping (aggregation) in sorted streams, order-preserving exchange (shuffle), merge join, and more. It already saves thousands of CPUs in Google's Napa and F1 Query systems, e.g., in grouping algorithms and in log-structured merge-forests. In order to realize the full benefit of interesting orderings, however, query execution algorithms must not only consume and exploit offset-value codes but also produce offset-value codes for the next operator in the pipeline. Our research has sought ways to produce offset-value codes without comparing successive output rows one-by-one, column-by-column. This short paper introduces a new theorem and, based on its proof and a simple corollary, describes in detail how order-preserving algorithms (from filter to merge join and even shuffle) can compute offset-value codes for their outputs. These computations are surprisingly simple and very efficient.
ISSN:	2331-8422