Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing
Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability t...
Saved in:
Published in | 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS) Vol. 30; pp. 553 - 561 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
Polish Information Processing Society
01.01.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 2300-5963 |
DOI | 10.15439/2022F225 |
Cover
Abstract | Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing checkpoint and rollback recovery. However, owing to the statistical probability of partial failures occurring in these distributed environments and the variability of workloads upon which jobs are expected to operate, static configurations will often not meet Quality of Service constraints with low overhead.In this paper we present Khaos, a new approach which utilizes the parallel processing capabilities of cloud orchestration technologies for the automatic runtime optimization of fault tolerance configurations in Distributed Stream Processing jobs. Our approach employs three subsequent phases which borrows from the principles of Chaos Engineering: establish the steady-state processing conditions, conduct experiments to better understand how the system performs under failure, and use this knowledge to continuously minimize Quality of Service violations. We implemented Khaos prototypically together with Apache Flink and demonstrate its usefulness experimentally. |
---|---|
AbstractList | Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing checkpoint and rollback recovery. However, owing to the statistical probability of partial failures occurring in these distributed environments and the variability of workloads upon which jobs are expected to operate, static configurations will often not meet Quality of Service constraints with low overhead.In this paper we present Khaos, a new approach which utilizes the parallel processing capabilities of cloud orchestration technologies for the automatic runtime optimization of fault tolerance configurations in Distributed Stream Processing jobs. Our approach employs three subsequent phases which borrows from the principles of Chaos Engineering: establish the steady-state processing conditions, conduct experiments to better understand how the system performs under failure, and use this knowledge to continuously minimize Quality of Service violations. We implemented Khaos prototypically together with Apache Flink and demonstrate its usefulness experimentally. |
Author | Kao, Odej Scheinert, Dominik Thamsen, Lauritz Geldenhuys, Morgan K. Pfister, Benjamin J. J. |
Author_xml | – sequence: 1 givenname: Morgan K. surname: Geldenhuys fullname: Geldenhuys, Morgan K. email: morgan.k.geldenhuys@tu-berlin.de organization: Technische Universität,Berlin,Germany – sequence: 2 givenname: Benjamin J. J. surname: Pfister fullname: Pfister, Benjamin J. J. email: benjamin.j.j.pfister@tu-berlin.de organization: Technische Universität,Berlin,Germany – sequence: 3 givenname: Dominik surname: Scheinert fullname: Scheinert, Dominik email: dominik.scheinert@tu-berlin.de organization: Technische Universität,Berlin,Germany – sequence: 4 givenname: Lauritz surname: Thamsen fullname: Thamsen, Lauritz email: lauritz.thamsen@tu-berlin.de organization: University of Glasgow,United Kingdom – sequence: 5 givenname: Odej surname: Kao fullname: Kao, Odej email: odej.kao@tu-berlin.de organization: Technische Universität,Berlin,Germany |
BookMark | eNo9kE9Lw0AUxFdRsNYePHvJF6ju_2S9SWu1WKiggrfwsvvSbk2yYRMP9dMbrXiZ4Q3Dj8eck5MmNEjIJaPXTElhbjjlfMG5OiITk2aZMJpLPugxGXFB6VQZLc7IpOt2lFLOJOVSj8j70xZCd5vM9w3U3kJV7ZN12_vaf_lmk8y2aD_a4Jv-5ypDTObYYuOgqDCZ-66Pvvjs0SUvfUSok-cYLHbdUL4gpyVUHU7-fEzeFvevs8fpav2wnN2tpk5Q2U9LK8EIhtqlFnmBtmQp5TrTGQow1DiTWQSRKqGsdWi0lYiKpQiYaQFSjMnywHUBdnkbfQ1xnwfw-W8Q4iaH2HtbYe6Y5qJwtGTMSA4OkGnmhEJXICjrBtbVgeUR8Z9lhjeGucQ3tc5tgw |
ContentType | Conference Proceeding Journal Article |
DBID | 6IE 6IL CBEJK RIE RIL DOA |
DOI | 10.15439/2022F225 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present DOAJ Directory of Open Access Journals |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9788396242396 8396242399 |
EISSN | 2300-5963 |
EndPage | 561 |
ExternalDocumentID | oai_doaj_org_article_d1623bd0f11942adae161d35edbea5cd 9909140 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL 6IF 6IN AAJGR AAWTH ABLEC ADBBV ADZIZ ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ CHZPO GROUPED_DOAJ IEGSK M~E OCL OK1 Y2W |
ID | FETCH-LOGICAL-d304t-fc4a931e6d7ce2becf17026868e3a909d98cea37535ccde96c4ee517eae863a43 |
IEDL.DBID | DOA |
IngestDate | Wed Aug 27 01:14:48 EDT 2025 Thu Jan 18 11:14:33 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-d304t-fc4a931e6d7ce2becf17026868e3a909d98cea37535ccde96c4ee517eae863a43 |
OpenAccessLink | https://doaj.org/article/d1623bd0f11942adae161d35edbea5cd |
PageCount | 9 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_d1623bd0f11942adae161d35edbea5cd ieee_primary_9909140 |
PublicationCentury | 2000 |
PublicationDate | 2022-01-01 |
PublicationDateYYYYMMDD | 2022-01-01 |
PublicationDate_xml | – month: 01 year: 2022 text: 2022-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationTitle | 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS) |
PublicationTitleAbbrev | FedCSIS |
PublicationYear | 2022 |
Publisher | Polish Information Processing Society |
Publisher_xml | – name: Polish Information Processing Society |
SSID | ssj0002140246 |
Score | 2.2356923 |
Snippet | Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their... |
SourceID | doaj ieee |
SourceType | Open Website Publisher |
StartPage | 553 |
SubjectTerms | Chaos Chaos Engineering Cloud Distributed Stream Processing Fault tolerance Fault tolerant systems Parallel processing Parallel Profiling Probability QoS Modeling Quality of service Runtime Runtime Optimization |
SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJyZALaK85IGRlCZ2HmakpapABQYqdYsu9kWt-kgF7UB_PWcnFIEY2CLLcqzz6fyd7fs-xq4koVQQCjwtMPAkgvIyGwxDk5sYO0io2RY4D5-iwUg-jMNxjV3vamEQ0T0-w7b9dHf5ptAbe1R2Q5FTUUKwx_bIzcparYosKKR91ebxQT-woteOfv-HXorbLvoHbPj1o_KVyKy9WWdtvf3FwfjfmRyy5ndhHn_ZbTlHrIbLBhs_TqB4v-W9Ulwe5vMP_kyRYDHdUifenaCerYqpk4TghFF5zwnf2pop3rO8uVbyCg23F9Sw4FXpAHVuslH__rU78CrBBM-Ijlx7uZaghI-RiTUGtDq5H1OOlUQJCqAZG5VoBEEZSqi1QRVpiRj6MQImkQApjll9WSzxhHElBEQiplCYSKlQqgCT3GColZ-DiJIWu7M2TlclJ0ZqWapdA5kprZw-NT6Bq8x0ct9XMgADSPjSiBBNhhBq02INa9rdIJVVT_9uPmP7doXL849zVl-_bfCCEME6u3Su8Aknwrrr priority: 102 providerName: IEEE |
Title | Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing |
URI | https://ieeexplore.ieee.org/document/9909140 https://doaj.org/article/d1623bd0f11942adae161d35edbea5cd |
Volume | 30 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV07T8MwELZQJxZeBVEelQfWiMSvxGzQUlUgYKFSt-hiX9SKvgRlgF_POQmoTCysUWRLd_Hd9yW572PsQhFKBWkhchJFpBBsVIRiqH3pU4yRUHMYcH54NMORuhvr8YbVV_gnrJYHrgN36RNq0IWPy4TotgAPSBjFS42-QNDOh-ob23iDTIUaLIg3CGUaKSFNXTewfDEQwRK7Euf_5aZSNZPBHttpUCC_rnffZ1u4OGC73w4LvDlwbTa-n8Dy7Yr3a994mM0--BMd8vn0k1oO703QvayW08rtgRP85P3K0zaMQ_F-kMQNblboefj2DHPeTAXQzYdsNLh97g2jxgsh8jJW66h0CqxM0PjUoaDAl0lK9CkzGUqwsfU2cwiSyId2zqM1TiHqJEXAzEhQ8oi1FssFHjNupQQjU6pymVIWlRWYlR61s0kJ0mQddhMClK9quYs8CFBXFygteZOW_K-0dFg7hPdnEWp6lnJy8h9rn7LtkMz6RcgZa61f3_GcoMG66FZPQbea4vsCwpi7FQ |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKGWAC1CLeeGAkpYntJGakpSr0AUMrdYsc-6JWfQrSgf56zkkoAjGwRZblWGfr7jvb932E3HBEqYpJ5WgGnsNBSSe2zlCYxARQB0TNtsC51_fbQ_48EqMSud3WwgBA9vgMavYzu8s3S722R2V36DklJgQ7ZBfjPhd5tVZBFyQwstpM3mt5VvY6I-D_oZiSBYzWAel9_Sp_JzKtrdO4pje_WBj_O5dDUv0uzaOv26BzREqwqJBRZ6yW7_e0mcvLq9nsg76gL5hPNtiJNsagp6vlJBOFoIhSaTOTvrVVU7RpmXOt6BUYaq-o1ZwWxQPYuUqGrcdBo-0UkgmOYXWeOonmSjIXfBNo8HB9EjfALCv0Q2AKZ2xkqEExzFGE1gakrzmAcANQEPpMcXZMyovlAk4IlYwpnwXoDEPOJXDpQZgYEFq6iWJ-eEoerI2jVc6KEVme6qwBzRQV2z4yLsKr2NQT15XcU0YBIkzDBJgYlNDmlFSsabeDFFY9-7v5muy1B71u1H3qd87Jvl3t_DTkgpTTtzVcIj5I46tsW3wCF_i-OA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+17th+Conference+on+Computer+Science+and+Intelligence+Systems+%28FedCSIS%29&rft.atitle=Khaos%3A+Dynamically+Optimizing+Checkpointing+for+Dependable+Distributed+Stream+Processing&rft.au=Geldenhuys%2C+Morgan+K.&rft.au=Pfister%2C+Benjamin+J.+J.&rft.au=Scheinert%2C+Dominik&rft.au=Thamsen%2C+Lauritz&rft.date=2022-01-01&rft.pub=Polish+Information+Processing+Society&rft.spage=553&rft.epage=561&rft_id=info:doi/10.15439%2F2022F225&rft.externalDocID=9909140 |