Turbine: Facebook's Service Management Platform for Stream Processing

The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time applications require strict Service Level Objectives (SLOs) with low downtime and processing lag-even in the presence of failures and load v...

Full description

Saved in:
Bibliographic Details
Published inData engineering pp. 1591 - 1602
Main Authors Mei, Yuan, Cheng, Luwei, Talwar, Vanish, Levin, Michael Y., Jacques-Silva, Gabriela, Simha, Nikhil, Banerjee, Anirban, Smith, Brian, Williamson, Tim, Yilmaz, Serhat, Chen, Weitao, Chen, Guoqiang Jerry
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2020
Subjects
Online AccessGet full text
ISSN2375-026X
DOI10.1109/ICDE48307.2020.00141

Cover

Loading…
Abstract The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time applications require strict Service Level Objectives (SLOs) with low downtime and processing lag-even in the presence of failures and load variability. Addressing this challenge at Facebook scale led to the development of Turbine, a management platform designed to bridge the gap between the capabilities of the existing general-purpose cluster management frameworks and Facebook's stream processing requirements. Specifically, Turbine features a fast and scalable task scheduler; an efficient predictive auto scaler; and an application update mechanism that provides fault-tolerance, atomicity, consistency, isolation and durability.Turbine has been in production for over three years, and one of the core technologies that enabled a booming growth of stream processing at Facebook. It is currently deployed on clusters spanning tens of thousands of machines, managing several thousands of streaming pipelines processing terabytes of data per second in real time. Our production experience has validated Turbine's effectiveness: its task scheduler evenly balances workload fluctuation across clusters; its auto scaler effectively and predictively handles unplanned load spikes; and the application update mechanism consistently and efficiently completes high scale updates within minutes. This paper describes the Turbine architecture, discusses the design choices behind it, and shares several case studies demonstrating Turbine capabilities in production.
AbstractList The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time applications require strict Service Level Objectives (SLOs) with low downtime and processing lag-even in the presence of failures and load variability. Addressing this challenge at Facebook scale led to the development of Turbine, a management platform designed to bridge the gap between the capabilities of the existing general-purpose cluster management frameworks and Facebook's stream processing requirements. Specifically, Turbine features a fast and scalable task scheduler; an efficient predictive auto scaler; and an application update mechanism that provides fault-tolerance, atomicity, consistency, isolation and durability.Turbine has been in production for over three years, and one of the core technologies that enabled a booming growth of stream processing at Facebook. It is currently deployed on clusters spanning tens of thousands of machines, managing several thousands of streaming pipelines processing terabytes of data per second in real time. Our production experience has validated Turbine's effectiveness: its task scheduler evenly balances workload fluctuation across clusters; its auto scaler effectively and predictively handles unplanned load spikes; and the application update mechanism consistently and efficiently completes high scale updates within minutes. This paper describes the Turbine architecture, discusses the design choices behind it, and shares several case studies demonstrating Turbine capabilities in production.
Author Chen, Weitao
Cheng, Luwei
Mei, Yuan
Jacques-Silva, Gabriela
Simha, Nikhil
Banerjee, Anirban
Williamson, Tim
Yilmaz, Serhat
Smith, Brian
Talwar, Vanish
Levin, Michael Y.
Chen, Guoqiang Jerry
Author_xml – sequence: 1
  givenname: Yuan
  surname: Mei
  fullname: Mei, Yuan
  organization: Facebook Inc
– sequence: 2
  givenname: Luwei
  surname: Cheng
  fullname: Cheng, Luwei
  organization: Facebook Inc
– sequence: 3
  givenname: Vanish
  surname: Talwar
  fullname: Talwar, Vanish
  organization: Facebook Inc
– sequence: 4
  givenname: Michael Y.
  surname: Levin
  fullname: Levin, Michael Y.
  organization: Facebook Inc
– sequence: 5
  givenname: Gabriela
  surname: Jacques-Silva
  fullname: Jacques-Silva, Gabriela
  organization: Facebook Inc
– sequence: 6
  givenname: Nikhil
  surname: Simha
  fullname: Simha, Nikhil
  organization: Facebook Inc
– sequence: 7
  givenname: Anirban
  surname: Banerjee
  fullname: Banerjee, Anirban
  organization: Facebook Inc
– sequence: 8
  givenname: Brian
  surname: Smith
  fullname: Smith, Brian
  organization: Facebook Inc
– sequence: 9
  givenname: Tim
  surname: Williamson
  fullname: Williamson, Tim
  organization: Facebook Inc
– sequence: 10
  givenname: Serhat
  surname: Yilmaz
  fullname: Yilmaz, Serhat
  organization: Facebook Inc
– sequence: 11
  givenname: Weitao
  surname: Chen
  fullname: Chen, Weitao
  organization: Facebook Inc
– sequence: 12
  givenname: Guoqiang Jerry
  surname: Chen
  fullname: Chen, Guoqiang Jerry
  organization: Facebook Inc
BookMark eNotjE1Lw0AUAFdRsK39BXrYm6fE93Y3--FNYqqFioVW8FY2m5cSbRLZRMF_b0HnMHObKTvr-o4Yu0ZIEcHdLvOHQlkJJhUgIAVAhSdsikZYFA6kPmUTIU2WgNBvF2w-DO9wxCnEDCas2H7Fsunoji98oLLvP24GvqH43QTiz77ze2qpG_n64Me6jy0_im_GSL7l69gHGoam21-y89ofBpr_d8ZeF8U2f0pWL4_L_H6VNALkmNjKKm2trrJQUl1lVSgtaheUdtLUIlgEsgYNgfOZCLXIglDCUyWNVmBIztjV37chot1nbFoff3YOAY208hfGgk0k
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDE48307.2020.00141
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1728129036
9781728129037
EISSN 2375-026X
EndPage 1602
ExternalDocumentID 9101738
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-8d846886d5cbefd5dcb8169c46937f2c810e8717e09a52cf25c242aed376407e3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:42:32 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-8d846886d5cbefd5dcb8169c46937f2c810e8717e09a52cf25c242aed376407e3
PageCount 12
ParticipantIDs ieee_primary_9101738
PublicationCentury 2000
PublicationDate 2020-April
PublicationDateYYYYMMDD 2020-04-01
PublicationDate_xml – month: 04
  year: 2020
  text: 2020-April
PublicationDecade 2020
PublicationTitle Data engineering
PublicationTitleAbbrev ICDE
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941150
Score 2.308621
Snippet The demand for stream processing at Facebook has grown as services increasingly rely on real-time signals to speed up decisions and actions. Emerging real-time...
SourceID ieee
SourceType Publisher
StartPage 1591
SubjectTerms Cluster Management
Stream Processing
Title Turbine: Facebook's Service Management Platform for Stream Processing
URI https://ieeexplore.ieee.org/document/9101738
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkwFWsS3PCCxkDaJ7cRmLa0KUlGHVupW-eMiIaBFJVn49ZyTtEWIgSWyMjiW7fju3b3nI-QmCznTQkEQWskDrkUaKB5D4EwUWcWdzKyPQ06ek_GcPy3EokHudloYACjJZ9DzzTKX79a28KGyvvL7h8kmaSJwq7Rau3gKwhTv3NTquChU_cfBw5BL3MOIAmNP4Ip82fcfNVRKEzJqk8n24xVz5LVX5KZnv37dy_jf0R2S7l6sR6c7M3REGrA6Ju1ttQZa_7wdMpwVG4TBcE9H2pY50ttPWh8VdM-CodM3nXtPluKD-qS1fqe1nAD775L5aDgbjIO6iELwEocsD6RDD0PKxAlrIHPCWSOjRFmExSzNYiujEBA0pRAqLWKbxcKi1dbg8ORBsAfshLRW6xWcEirASq2VNKlFEMK4ieIMO3EZcwm3xpyRjp-V5Ud1T8aynpDzv19fkAO_LhUL5pK08k0BV2jgc3Ndruw3bgKlfA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH5BPOgJFYy_7cHEi8N1a0fnFSGgQDhAwo30x1tiVDC4XfzrbbcBxnjwsjQ7dE3b9Xvfe-_rA7hJfBZKHqPna8E8JnnLi1mAnlGU6pgZkWjnhxyOot6UPc34rAJ3Gy0MIubJZ9h0zTyWb5Y6c66y-9jtn1DswK7FfU4LtdbGo2KJijNvSn0c9eP7fvuxw4TdxZYHBi6Fi7rC7z-qqOQg0q3BcP35InfktZmlqqm_ft3M-N_xHUBjK9cj4w0QHUIFF0dQW9drIOXvW4fOJFtZIowPpCt1HiW9_STlYUG2eTBk_CZTZ8sS-yAubC3fSSkosP03YNrtTNo9ryyj4L0Efph6wlgbQ4jIcK0wMdxoJWgUa0uMw1YSaEF9tLSphX4seaCTgGuL2xKNPXss3cPwGKqL5QJPgHDUQspYqJa2NCRkigaJ7cQkoYmYVuoU6m5W5h_FTRnzckLO_n59DXu9yXAwH_RHz-ew79aoyIm5gGq6yvDSwn2qrvJV_gZB2ajF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Data+engineering&rft.atitle=Turbine%3A+Facebook%27s+Service+Management+Platform+for+Stream+Processing&rft.au=Mei%2C+Yuan&rft.au=Cheng%2C+Luwei&rft.au=Talwar%2C+Vanish&rft.au=Levin%2C+Michael+Y.&rft.date=2020-04-01&rft.pub=IEEE&rft.eissn=2375-026X&rft.spage=1591&rft.epage=1602&rft_id=info:doi/10.1109%2FICDE48307.2020.00141&rft.externalDocID=9101738