Turbine: Facebook's Service Management Platform for Stream Processing
The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time applications require strict Service Level Objectives (SLOs) with low downtime and processing lag-even in the presence of failures and load v...
Saved in:
Published in | Data engineering pp. 1591 - 1602 |
---|---|
Main Authors | , , , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.04.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 2375-026X |
DOI | 10.1109/ICDE48307.2020.00141 |
Cover
Loading…
Abstract | The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time applications require strict Service Level Objectives (SLOs) with low downtime and processing lag-even in the presence of failures and load variability. Addressing this challenge at Facebook scale led to the development of Turbine, a management platform designed to bridge the gap between the capabilities of the existing general-purpose cluster management frameworks and Facebook's stream processing requirements. Specifically, Turbine features a fast and scalable task scheduler; an efficient predictive auto scaler; and an application update mechanism that provides fault-tolerance, atomicity, consistency, isolation and durability.Turbine has been in production for over three years, and one of the core technologies that enabled a booming growth of stream processing at Facebook. It is currently deployed on clusters spanning tens of thousands of machines, managing several thousands of streaming pipelines processing terabytes of data per second in real time. Our production experience has validated Turbine's effectiveness: its task scheduler evenly balances workload fluctuation across clusters; its auto scaler effectively and predictively handles unplanned load spikes; and the application update mechanism consistently and efficiently completes high scale updates within minutes. This paper describes the Turbine architecture, discusses the design choices behind it, and shares several case studies demonstrating Turbine capabilities in production. |
---|---|
AbstractList | The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time applications require strict Service Level Objectives (SLOs) with low downtime and processing lag-even in the presence of failures and load variability. Addressing this challenge at Facebook scale led to the development of Turbine, a management platform designed to bridge the gap between the capabilities of the existing general-purpose cluster management frameworks and Facebook's stream processing requirements. Specifically, Turbine features a fast and scalable task scheduler; an efficient predictive auto scaler; and an application update mechanism that provides fault-tolerance, atomicity, consistency, isolation and durability.Turbine has been in production for over three years, and one of the core technologies that enabled a booming growth of stream processing at Facebook. It is currently deployed on clusters spanning tens of thousands of machines, managing several thousands of streaming pipelines processing terabytes of data per second in real time. Our production experience has validated Turbine's effectiveness: its task scheduler evenly balances workload fluctuation across clusters; its auto scaler effectively and predictively handles unplanned load spikes; and the application update mechanism consistently and efficiently completes high scale updates within minutes. This paper describes the Turbine architecture, discusses the design choices behind it, and shares several case studies demonstrating Turbine capabilities in production. |
Author | Chen, Weitao Cheng, Luwei Mei, Yuan Jacques-Silva, Gabriela Simha, Nikhil Banerjee, Anirban Williamson, Tim Yilmaz, Serhat Smith, Brian Talwar, Vanish Levin, Michael Y. Chen, Guoqiang Jerry |
Author_xml | – sequence: 1 givenname: Yuan surname: Mei fullname: Mei, Yuan organization: Facebook Inc – sequence: 2 givenname: Luwei surname: Cheng fullname: Cheng, Luwei organization: Facebook Inc – sequence: 3 givenname: Vanish surname: Talwar fullname: Talwar, Vanish organization: Facebook Inc – sequence: 4 givenname: Michael Y. surname: Levin fullname: Levin, Michael Y. organization: Facebook Inc – sequence: 5 givenname: Gabriela surname: Jacques-Silva fullname: Jacques-Silva, Gabriela organization: Facebook Inc – sequence: 6 givenname: Nikhil surname: Simha fullname: Simha, Nikhil organization: Facebook Inc – sequence: 7 givenname: Anirban surname: Banerjee fullname: Banerjee, Anirban organization: Facebook Inc – sequence: 8 givenname: Brian surname: Smith fullname: Smith, Brian organization: Facebook Inc – sequence: 9 givenname: Tim surname: Williamson fullname: Williamson, Tim organization: Facebook Inc – sequence: 10 givenname: Serhat surname: Yilmaz fullname: Yilmaz, Serhat organization: Facebook Inc – sequence: 11 givenname: Weitao surname: Chen fullname: Chen, Weitao organization: Facebook Inc – sequence: 12 givenname: Guoqiang Jerry surname: Chen fullname: Chen, Guoqiang Jerry organization: Facebook Inc |
BookMark | eNotjE1Lw0AUAFdRsK39BXrYm6fE93Y3--FNYqqFioVW8FY2m5cSbRLZRMF_b0HnMHObKTvr-o4Yu0ZIEcHdLvOHQlkJJhUgIAVAhSdsikZYFA6kPmUTIU2WgNBvF2w-DO9wxCnEDCas2H7Fsunoji98oLLvP24GvqH43QTiz77ze2qpG_n64Me6jy0_im_GSL7l69gHGoam21-y89ofBpr_d8ZeF8U2f0pWL4_L_H6VNALkmNjKKm2trrJQUl1lVSgtaheUdtLUIlgEsgYNgfOZCLXIglDCUyWNVmBIztjV37chot1nbFoff3YOAY208hfGgk0k |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICDE48307.2020.00141 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 1728129036 9781728129037 |
EISSN | 2375-026X |
EndPage | 1602 |
ExternalDocumentID | 9101738 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i203t-8d846886d5cbefd5dcb8169c46937f2c810e8717e09a52cf25c242aed376407e3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:42:32 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-8d846886d5cbefd5dcb8169c46937f2c810e8717e09a52cf25c242aed376407e3 |
PageCount | 12 |
ParticipantIDs | ieee_primary_9101738 |
PublicationCentury | 2000 |
PublicationDate | 2020-April |
PublicationDateYYYYMMDD | 2020-04-01 |
PublicationDate_xml | – month: 04 year: 2020 text: 2020-April |
PublicationDecade | 2020 |
PublicationTitle | Data engineering |
PublicationTitleAbbrev | ICDE |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000941150 |
Score | 2.308621 |
Snippet | The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1591 |
SubjectTerms | Cluster Management Stream Processing |
Title | Turbine: Facebook's Service Management Platform for Stream Processing |
URI | https://ieeexplore.ieee.org/document/9101738 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkwFWsS3PCCxkDaJ7cRmLa0KUlGHVupW-eMiIaBFJVn49ZyTtEWIgSWyMjiW7fju3b3nI-QmCznTQkEQWskDrkUaKB5D4EwUWcWdzKyPQ06ek_GcPy3EokHudloYACjJZ9DzzTKX79a28KGyvvL7h8kmaSJwq7Rau3gKwhTv3NTquChU_cfBw5BL3MOIAmNP4Ip82fcfNVRKEzJqk8n24xVz5LVX5KZnv37dy_jf0R2S7l6sR6c7M3REGrA6Ju1ttQZa_7wdMpwVG4TBcE9H2pY50ttPWh8VdM-CodM3nXtPluKD-qS1fqe1nAD775L5aDgbjIO6iELwEocsD6RDD0PKxAlrIHPCWSOjRFmExSzNYiujEBA0pRAqLWKbxcKi1dbg8ORBsAfshLRW6xWcEirASq2VNKlFEMK4ieIMO3EZcwm3xpyRjp-V5Ud1T8aynpDzv19fkAO_LhUL5pK08k0BV2jgc3Ndruw3bgKlfA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH5BPOgJFYy_7cHEi8N1a0fnFSGgQDhAwo30x1tiVDC4XfzrbbcBxnjwsjQ7dE3b9Xvfe-_rA7hJfBZKHqPna8E8JnnLi1mAnlGU6pgZkWjnhxyOot6UPc34rAJ3Gy0MIubJZ9h0zTyWb5Y6c66y-9jtn1DswK7FfU4LtdbGo2KJijNvSn0c9eP7fvuxw4TdxZYHBi6Fi7rC7z-qqOQg0q3BcP35InfktZmlqqm_ft3M-N_xHUBjK9cj4w0QHUIFF0dQW9drIOXvW4fOJFtZIowPpCt1HiW9_STlYUG2eTBk_CZTZ8sS-yAubC3fSSkosP03YNrtTNo9ryyj4L0Efph6wlgbQ4jIcK0wMdxoJWgUa0uMw1YSaEF9tLSphX4seaCTgGuL2xKNPXss3cPwGKqL5QJPgHDUQspYqJa2NCRkigaJ7cQkoYmYVuoU6m5W5h_FTRnzckLO_n59DXu9yXAwH_RHz-ew79aoyIm5gGq6yvDSwn2qrvJV_gZB2ajF |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Data+engineering&rft.atitle=Turbine%3A+Facebook%27s+Service+Management+Platform+for+Stream+Processing&rft.au=Mei%2C+Yuan&rft.au=Cheng%2C+Luwei&rft.au=Talwar%2C+Vanish&rft.au=Levin%2C+Michael+Y.&rft.date=2020-04-01&rft.pub=IEEE&rft.eissn=2375-026X&rft.spage=1591&rft.epage=1602&rft_id=info:doi/10.1109%2FICDE48307.2020.00141&rft.externalDocID=9101738 |