Real-Time LSM-Trees for HTAP Workloads

Real-time analytics systems employ hybrid data layouts in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high insert rates, while older data are transformed to a column-oriented format for OLA...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE 39th International Conference on Data Engineering (ICDE) pp. 1208 - 1220
Main Authors Saxena, Hemant, Golab, Lukasz, Idreos, Stratos, Ilyas, Ihab F.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Real-time analytics systems employ hybrid data layouts in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high insert rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycle-aware storage engine due to its high write throughput and level-oriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER - a prototype implementation of our idea built on top of the RocksDB key-value store.
AbstractList Real-time analytics systems employ hybrid data layouts in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high insert rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycle-aware storage engine due to its high write throughput and level-oriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER - a prototype implementation of our idea built on top of the RocksDB key-value store.
Author Ilyas, Ihab F.
Golab, Lukasz
Idreos, Stratos
Saxena, Hemant
Author_xml – sequence: 1
  givenname: Hemant
  surname: Saxena
  fullname: Saxena, Hemant
  email: h.saxena@sap.com
  organization: SAP Labs,Waterloo
– sequence: 2
  givenname: Lukasz
  surname: Golab
  fullname: Golab, Lukasz
  email: lgolab@uwaterloo.ca
  organization: University of Waterloo
– sequence: 3
  givenname: Stratos
  surname: Idreos
  fullname: Idreos, Stratos
  email: stratos@seas.harvard.edu
  organization: Harvard University
– sequence: 4
  givenname: Ihab F.
  surname: Ilyas
  fullname: Ilyas, Ihab F.
  email: ilyas@uwaterloo.ca
  organization: University of Waterloo
BookMark eNotjrFOwzAUAA0CibbkDzpkYnP6nu0Xx2MVCq0UBIIg2CrHfpECaYMSFv4eEEy3nE43F2fH4chCLBEyRHCrXXm9ISKkTIHSGQA4eyISZ12hCbRSyrpTMVPakgSVv16IZJre4NcziAQzcfXIvpd1d-C0erqT9cg8pe0wptt6_ZC-DON7P_g4XYrz1vcTJ_9ciOebTV1uZXV_uyvXlewQ3ae0mj0WiBxUTt6roJ1pSQXjfmZiMK2H4L1uiphrH9toCIibognodIzQ6IVY_nU7Zt5_jN3Bj197BCyMtUZ_A-g1QlU
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDE55515.2023.00097
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350322279
EISSN 2375-026X
EndPage 1220
ExternalDocumentID 10184774
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-i119t-73ea1811ec265aa2c394f52c49032dc4fa0caa3b8d63adfd4505eb8bc193dd0b3
IEDL.DBID RIE
IngestDate Wed Jun 26 19:25:30 EDT 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-73ea1811ec265aa2c394f52c49032dc4fa0caa3b8d63adfd4505eb8bc193dd0b3
PageCount 13
ParticipantIDs ieee_primary_10184774
PublicationCentury 2000
PublicationDate 2023-April
PublicationDateYYYYMMDD 2023-04-01
PublicationDate_xml – month: 04
  year: 2023
  text: 2023-April
PublicationDecade 2020
PublicationTitle 2023 IEEE 39th International Conference on Data Engineering (ICDE)
PublicationTitleAbbrev ICDE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941150
Score 2.2654583
Snippet Real-time analytics systems employ hybrid data layouts in which data are stored in different formats throughout their lifecycle. Recent data are stored in a...
SourceID ieee
SourceType Publisher
StartPage 1208
SubjectTerms Costs
Data engineering
HTAP
Laser modes
Layout
LSM Trees
Prototypes
Real-time systems
Storage
Throughput
Title Real-Time LSM-Trees for HTAP Workloads
URI https://ieeexplore.ieee.org/document/10184774
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGG6Ekyf8wPidHYy3jm5rR3s0CEEjhCgk3Eg_3iZGwwyMi7_ethsYTUy8Lb2s7do-T9-9z_sgdOMwRFvls8TdisEUjMaK5wInxgqZM9O1wb5tNM6HM_o4Z_NarB60MAAQks8g9o_hX74p9MaHyjq-uhR1fKWBGpyklVhrF1Bx9xTPbmp5XEJE56F332eOEbDYe4THQbPww0QlYMighcbbt1epI2_xplSx_vxVmPHf3TtA7W-5XjTZAdEh2oPlEWpt_Rqievseo9tnxwqxF31ETy8jPF0BrCNHWqPh9G4S-bD5eyHNuo1mg_60N8S1TwJ-TRJR4m4G0gF1AjrNmZSpzgS1LNVUkCw1mlpJtJSZ4ibPpLGGOtYDiivtyJsxRGUnqLkslnCKIiJSCSzlhGpLlVbSHQAKut5G0xLB-Rlq-3EvPqpSGIvtkM__aL9A-37uq1SXS9QsVxu4cihequvw9b4AuoKZ6A
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bT8IwFD5RfNAnvGC8uwfjW8cu3VgfDUKGAiE6Et5IL6cJ0YCB8eKvt90GRhMT35ZuSdt02_f19HznA7gzGCK1sFni5o0hFJUkIokZ8ZVmPI5USxf2bYNhnI7p0ySaVGL1QguDiEXyGbr2sjjLVwu5tqGypq0uRQ1f2YU9A_QeK-Va25CK2alYflMJ5Mz9Zq_92InMo5FrXcLdQrXww0alQJFuHYab_svkkTd3nQtXfv4qzfjvAR5C41uw54y2UHQEOzg_hvrGscGpPuATuH8xvJBY2YfTfx2QbIm4cgxtddLsYeTYwPn7gqtVA8bdTtZOSeWUQGa-z3LSCpEbqPZRBnHEeSBDRnUUSMq8MFCSau5JzkORqDjkSitqeA-KREhD35TyRHgKtflijmfgeCzgGAWJR6WmQgpufgECW9ZIU3ssSc6hYec9_SiLYUw3U774o_0W9tNs0J_2e8PnSziw61AmvlxBLV-u8dpgei5uipX8AhDenT4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+IEEE+39th+International+Conference+on+Data+Engineering+%28ICDE%29&rft.atitle=Real-Time+LSM-Trees+for+HTAP+Workloads&rft.au=Saxena%2C+Hemant&rft.au=Golab%2C+Lukasz&rft.au=Idreos%2C+Stratos&rft.au=Ilyas%2C+Ihab+F.&rft.date=2023-04-01&rft.pub=IEEE&rft.eissn=2375-026X&rft.spage=1208&rft.epage=1220&rft_id=info:doi/10.1109%2FICDE55515.2023.00097&rft.externalDocID=10184774