TerrierTail: Mitigating Tail Latency of Cloud Virtual Machines

Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tai...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on parallel and distributed systems Vol. 29; no. 10; pp. 2346 - 2359
Main Authors Asyabi, Esmail, SanaeeKohroudi, SeyedAlireza, Sharifi, Mohsen, Bestavros, Azer
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tail of the latency distribution of service components has become a subject of intensive research. The key characteristics of clouds such as elasticity and on-demand resource provisioning have made clouds attractive for hosting large-scale online services wherein VMs are the building blocks of services. However, adherence to traditional hypervisor scheduling policies has led to unpredictable CPU access latencies for virtual CPUs (vCPUs) that are responsible for performing network IO processes. This has resulted in poor and unpredictable performance for network IO, exacerbating VMs' long tail latencies and discouraging the hosting of large-scale parallel web services on virtualized clouds. This paper presents TerrierTail, a hypervisor CPU scheduler whose primary goal is to trim the tail of the latency distribution of individual VMs in virtualized clouds. In TerrierTail, we have modified the network driver to identify vCPUs that are responsible for performing network IO processes. Leveraging this information, the TerrierTail scheduler mitigates the CPU access latencies of such vCPUs using novel scheduling policies, resulting in a higher and more predictable network IO performance and therefore lower tail latency. TerrierTail's gains come at no measurable negative impacts on other performance attributes (e.g., fairness) or on the performance of VMs running other types of workloads (e.g., CPU-intensive VMs). A prototype implementation of TerrierTail in the Xen hypervisor substantially outperforms the default Credit scheduler of Xen. For example, TerrierTail mitigates the tail latency of a Memcached server by up to 53 percent and an RPC server by up to 50 percent at 99.9th percentile.
AbstractList Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tail of the latency distribution of service components has become a subject of intensive research. The key characteristics of clouds such as elasticity and on-demand resource provisioning have made clouds attractive for hosting large-scale online services wherein VMs are the building blocks of services. However, adherence to traditional hypervisor scheduling policies has led to unpredictable CPU access latencies for virtual CPUs (vCPUs) that are responsible for performing network IO processes. This has resulted in poor and unpredictable performance for network IO, exacerbating VMs' long tail latencies and discouraging the hosting of large-scale parallel web services on virtualized clouds. This paper presents TerrierTail, a hypervisor CPU scheduler whose primary goal is to trim the tail of the latency distribution of individual VMs in virtualized clouds. In TerrierTail, we have modified the network driver to identify vCPUs that are responsible for performing network IO processes. Leveraging this information, the TerrierTail scheduler mitigates the CPU access latencies of such vCPUs using novel scheduling policies, resulting in a higher and more predictable network IO performance and therefore lower tail latency. TerrierTail's gains come at no measurable negative impacts on other performance attributes (e.g., fairness) or on the performance of VMs running other types of workloads (e.g., CPU-intensive VMs). A prototype implementation of TerrierTail in the Xen hypervisor substantially outperforms the default Credit scheduler of Xen. For example, TerrierTail mitigates the tail latency of a Memcached server by up to 53 percent and an RPC server by up to 50 percent at 99.9th percentile.
Author Asyabi, Esmail
SanaeeKohroudi, SeyedAlireza
Bestavros, Azer
Sharifi, Mohsen
Author_xml – sequence: 1
  givenname: Esmail
  orcidid: 0000-0003-3616-4819
  surname: Asyabi
  fullname: Asyabi, Esmail
  email: easyabi@bu.edu
  organization: Computer Science Department, Boston University, Boston, MA
– sequence: 2
  givenname: SeyedAlireza
  orcidid: 0000-0001-6461-1650
  surname: SanaeeKohroudi
  fullname: SanaeeKohroudi, SeyedAlireza
  email: sarsanaee@comp.iust.ac.ir
  organization: School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
– sequence: 3
  givenname: Mohsen
  surname: Sharifi
  fullname: Sharifi, Mohsen
  email: msharifi@iust.ac.ir
  organization: School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
– sequence: 4
  givenname: Azer
  orcidid: 0000-0003-0798-8835
  surname: Bestavros
  fullname: Bestavros, Azer
  email: best@bu.edu
  organization: Computer Science Department, Boston University, Boston, MA
BookMark eNo9kF1LwzAUhoNMcJv-APGm4HVnTtKkqReCzE_YULB6G7L0dGbUdibtxf69LR1enZfD854Dz4xM6qZGQi6BLgBodpO_P3wsGAW1YIqlNBUnZApCqJiB4pM-00TEGYPsjMxC2FEKiaDJlNzl6L1DnxtX3UZr17qtaV29jYZFtDIt1vYQNWW0rJquiL6cbztTRWtjv12N4ZyclqYKeHGcc_L59JgvX-LV2_Pr8n4VW5bxNraGKuTcpgUKtEDlpiiNpdYmaBG5kJwjs0YJUBssuUwLI0op-24GkEnkc3I93t375rfD0Opd0_m6f6kZQApSZpL1FIyU9U0IHku99-7H-IMGqgdNetCkB036qKnvXI0dh4j_vOJcUaX4HzhxZQw
CODEN ITDSEO
CitedBy_id crossref_primary_10_1109_ACCESS_2022_3187731
crossref_primary_10_3390_electronics9122107
crossref_primary_10_1002_cpe_7196
crossref_primary_10_1109_TWC_2023_3239531
Cites_doi 10.1145/3132747.3132780
10.1109/TPDS.2017.2706268
10.1145/3064176.3064189
10.1145/2670979.2671008
10.1145/1165389.945462
10.1145/2678373.2665718
10.1007/s10586-016-0541-5
10.1145/2885497
10.1145/2830772.2830779
10.1016/j.future.2018.01.015
10.1145/3132747.3132763
10.1145/2670979.2671006
10.1145/3132747.3132762
10.1109/BigDataCongress.2016.13
10.1145/2408776.2408794
10.1145/1346256.1346258
10.1145/1508293.1508308
10.1145/2287076.2287080
10.1145/3054742
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPDS.2018.2827075
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Xplore
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 2359
ExternalDocumentID 10_1109_TPDS_2018_2827075
8338088
Genre orig-research
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIC
RIE
RIG
RNS
TN5
TWZ
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c293t-ca08e33c7de5ec106bdfac0cc4ecee35633e2ca8518bef367da5f6629391196e3
IEDL.DBID RIE
ISSN 1045-9219
IngestDate Thu Oct 10 17:59:18 EDT 2024
Fri Aug 23 00:58:45 EDT 2024
Wed Jun 26 19:28:27 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-ca08e33c7de5ec106bdfac0cc4ecee35633e2ca8518bef367da5f6629391196e3
ORCID 0000-0003-3616-4819
0000-0001-6461-1650
0000-0003-0798-8835
PQID 2117166962
PQPubID 85437
PageCount 14
ParticipantIDs ieee_primary_8338088
crossref_primary_10_1109_TPDS_2018_2827075
proquest_journals_2117166962
PublicationCentury 2000
PublicationDate 2018-10-01
PublicationDateYYYYMMDD 2018-10-01
PublicationDate_xml – month: 10
  year: 2018
  text: 2018-10-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref12
liu (ref34) 2014
ref15
ref36
ref14
ref30
ref33
xu (ref2) 2013
ref10
wen (ref32) 2015
ref1
cheng (ref24) 2012
ref18
belay (ref31) 2014
(ref13) 2018
suo (ref16) 2016
lo (ref9) 2014; 42
ref26
leverich (ref4) 2014
ref25
li (ref3) 2014
li (ref5) 2016
zhang (ref23) 2016
xu (ref6) 2013
ref28
ref27
xu (ref11) 2013
ref29
ref8
kopytov (ref20) 2018
ref7
lozi (ref17) 2016
vamanan (ref22) 2015
carraway (ref19) 2018
norcott (ref21) 2018
References_xml – ident: ref30
  doi: 10.1145/3132747.3132780
– start-page: 1
  year: 2013
  ident: ref6
  article-title: Small is better: Avoiding latency traps in virtualized data centers
  publication-title: Proc 4th Ann Symp Cloud Computing
  contributor:
    fullname: xu
– year: 2018
  ident: ref21
  article-title: IOzone filesystem benchmark
  contributor:
    fullname: norcott
– ident: ref35
  doi: 10.1109/TPDS.2017.2706268
– year: 2018
  ident: ref13
– ident: ref29
  doi: 10.1145/3064176.3064189
– start-page: 329
  year: 2013
  ident: ref2
  article-title: Bobtail: Avoiding long tails in the cloud
  publication-title: Proc 10th USENIX Conf Netw Syst Des Implementation
  contributor:
    fullname: xu
– year: 2014
  ident: ref34
  article-title: RepNet: Cutting tail latency in data center networks with flow replication
  contributor:
    fullname: liu
– ident: ref33
  doi: 10.1145/2670979.2671008
– ident: ref15
  doi: 10.1145/1165389.945462
– start-page: 1
  year: 2016
  ident: ref17
  article-title: The Linux scheduler: A decade of wasted cores
  publication-title: Proc 11th Eur Conf Comput Syst
  contributor:
    fullname: lozi
– volume: 42
  start-page: 301
  year: 2014
  ident: ref9
  article-title: Towards energy proportionality for large-scale latency-critical workloads
  publication-title: SIGARCH Comput Archit News
  doi: 10.1145/2678373.2665718
  contributor:
    fullname: lo
– ident: ref25
  doi: 10.1007/s10586-016-0541-5
– ident: ref8
  doi: 10.1145/2885497
– start-page: 585
  year: 2015
  ident: ref22
  article-title: TimeTrader: Exploiting latency tail to save datacenter energy for online search
  publication-title: Proc 48th Annu IEEE/ACM Int Symp Microarchit
  doi: 10.1145/2830772.2830779
  contributor:
    fullname: vamanan
– ident: ref26
  doi: 10.1016/j.future.2018.01.015
– start-page: 1
  year: 2016
  ident: ref16
  article-title: Time capsule: Tracing packet latency across different layers in virtualized systems
  publication-title: Proc ACM SIGOPS Asia-Pacific Workshop Syst
  contributor:
    fullname: suo
– ident: ref7
  doi: 10.1145/3132747.3132763
– start-page: 317
  year: 2015
  ident: ref32
  article-title: Less can be more: Micro-managing VMs in Amazon EC2
  publication-title: Proc IEEE 8th Int Conf Cloud Comput
  contributor:
    fullname: wen
– start-page: 456
  year: 2016
  ident: ref23
  article-title: Treadmill: Attributing the source of tail latency through precise load testing and statistical inference
  publication-title: Proc ACM/IEEE Annu Int Symp Comput Archit
  contributor:
    fullname: zhang
– start-page: 243
  year: 2013
  ident: ref11
  article-title: vTurbo: Accelerating virtual machine I/O processing using designated turbo-sliced core
  publication-title: Proc USENIX Conf Annu Tech Conf
  contributor:
    fullname: xu
– ident: ref27
  doi: 10.1145/2670979.2671006
– start-page: 1
  year: 2012
  ident: ref24
  article-title: vBalance: Using interrupt load balance to improve I/O performance for SMP virtual machines
  publication-title: Proc ACM Symp Cloud Computing
  contributor:
    fullname: cheng
– ident: ref28
  doi: 10.1145/3132747.3132762
– year: 2018
  ident: ref20
  article-title: SysBench: A scriptable database and system performance benchmark
  contributor:
    fullname: kopytov
– ident: ref18
  doi: 10.1109/BigDataCongress.2016.13
– ident: ref1
  doi: 10.1145/2408776.2408794
– start-page: 49
  year: 2014
  ident: ref31
  article-title: IX: A protected dataplane operating system for high throughput and low latency
  publication-title: Proc 11th USENIX Conf Operating Syst Des Implementation
  contributor:
    fullname: belay
– ident: ref14
  doi: 10.1145/1346256.1346258
– start-page: 1
  year: 2016
  ident: ref5
  article-title: PSLO: Enforcing the Xth percentile latency and throughput SLOs for consolidated VM storage
  publication-title: Proc 11th Eur Conf Comput Syst
  contributor:
    fullname: li
– ident: ref12
  doi: 10.1145/1508293.1508308
– start-page: 1
  year: 2014
  ident: ref3
  article-title: Tales of the tail: Hardware, OS, and application-level sources of tail latency
  publication-title: Proc ACM Symp Cloud Comput
  contributor:
    fullname: li
– ident: ref10
  doi: 10.1145/2287076.2287080
– start-page: 1
  year: 2014
  ident: ref4
  article-title: Reconciling high server utilization and sub-millisecond quality-of-service
  publication-title: Proc 9th EUR Conf Comput
  contributor:
    fullname: leverich
– ident: ref36
  doi: 10.1145/3054742
– year: 2018
  ident: ref19
  article-title: Lookbusy: A synthetic load generator
  contributor:
    fullname: carraway
SSID ssj0014504
Score 2.3508196
Snippet Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 2346
SubjectTerms Boosting
Central processing units
Cloud computing
CPU scheduler
CPUs
Delays
Elasticity
Policies
Provisioning
Resource allocation
Scheduling
Servers
tail latency
Task analysis
Virtual environments
Virtual machine monitors
virtualization
Web services
Xen
Title TerrierTail: Mitigating Tail Latency of Cloud Virtual Machines
URI https://ieeexplore.ieee.org/document/8338088
https://www.proquest.com/docview/2117166962
Volume 29
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fT8IwEL4AT_ogChpRNH3wyTgYdOtWH0wMSogRYyIY3pauvRkiAcOPB_3rvW6DEPXBt2ZZl6bX6923--4O4MJDbtwg4E5gOHc81UqcUPjG0W3NY1RtMuo2Ubj_JHpD72HkjwpwtcmFQcSUfIYNO0xj-WamV_ZXWTMkPEVaUYRiIGWWq7WJGHh-2iqQ0IXvSFLDPILZcmVz8Hz3YklcYYPwReBaSuGWDUqbqvy6iVPz0i1Df72wjFXy3lgt44b--lGz8b8r34e93M9kt9nBOIACTitQXvdwYLlKV2B3qyBhFW4GtlAjzgdqPLlm_XFWgGP6xuwD9qisg_3JZgnrTGYrw17Hc5t-wvopIxMXhzDs3g86PSfvsECykHzpaOWGyLkODPqoCR3GJlHa1dpDMp7cF5xjWyvyysIYEy4Co_xECJpLd6QUyI-gNJ1N8RiYcZVWMjSuSQw5JWTkNCHvwISerziNanC53vPoIyukEaUAxJWRFVBkBRTlAqpB1e7h5sV8-2pQX0spylVtERGCJcwnpGif_D3rFHbstzMGXh1Ky_kKz8iTWMbn6RH6BqHQxNc
link.rule.ids 315,783,787,799,27936,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFH_oPKgHp1NxOjUHT2K3urRp60EQdUxdh2CV3UqavMpQNtnHQf96X9puDPXgLZSGhry8vN-v7wvgxEGubc_jlqc5txx5nlq-cLWlmoonKJtk1E2icNgV7Wfnvuf2luBsnguDiFnwGdbNMPPl66Gaml9lDZ_4FGnFMqwQrvZFnq019xk4btYskPiFawWkiIUP89wOGtHjzZMJ4_LrxDA82wQVLlihrK3Kr7s4MzCtMoSzpeVxJW_16SSpq68fVRv_u_ZN2CiQJrvKj8YWLOGgAuVZFwdWKHUF1hdKEm7DZWRKNeIokv33Cxb28xIcg1dmHrCONBD7kw1Tdv0-nGr20h-ZBBQWZjGZON6B59ZtdN22ih4LJI2ATywlbR85V55GFxXxw0SnUtlKOUjmk7uCc2wqSbjMTzDlwtPSTYWguXRLBgL5LpQGwwHuAdO2VDLwta1TTbCEzJwi7u1p33Elp1EVTmd7Hn_kpTTijILYQWwEFBsBxYWAqrBt9nD-YrF9VajNpBQXyjaOicMS6xOBaO7_PesYVttR2Ik7d92HA1gz38nj8WpQmoymeEi4YpIcZcfpGxl5yCI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TerrierTail%3A+Mitigating+Tail+Latency+of+Cloud+Virtual+Machines&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Esmail+Asyabi&rft.au=SanaeeKohroudi%2C+SeyedAlireza&rft.au=Sharifi%2C+Mohsen&rft.au=Bestavros%2C+Azer&rft.date=2018-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=29&rft.issue=10&rft.spage=2346&rft_id=info:doi/10.1109%2FTPDS.2018.2827075&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon