TerrierTail: Mitigating Tail Latency of Cloud Virtual Machines
Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tai...
Saved in:
Published in | IEEE transactions on parallel and distributed systems Vol. 29; no. 10; pp. 2346 - 2359 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.10.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tail of the latency distribution of service components has become a subject of intensive research. The key characteristics of clouds such as elasticity and on-demand resource provisioning have made clouds attractive for hosting large-scale online services wherein VMs are the building blocks of services. However, adherence to traditional hypervisor scheduling policies has led to unpredictable CPU access latencies for virtual CPUs (vCPUs) that are responsible for performing network IO processes. This has resulted in poor and unpredictable performance for network IO, exacerbating VMs' long tail latencies and discouraging the hosting of large-scale parallel web services on virtualized clouds. This paper presents TerrierTail, a hypervisor CPU scheduler whose primary goal is to trim the tail of the latency distribution of individual VMs in virtualized clouds. In TerrierTail, we have modified the network driver to identify vCPUs that are responsible for performing network IO processes. Leveraging this information, the TerrierTail scheduler mitigates the CPU access latencies of such vCPUs using novel scheduling policies, resulting in a higher and more predictable network IO performance and therefore lower tail latency. TerrierTail's gains come at no measurable negative impacts on other performance attributes (e.g., fairness) or on the performance of VMs running other types of workloads (e.g., CPU-intensive VMs). A prototype implementation of TerrierTail in the Xen hypervisor substantially outperforms the default Credit scheduler of Xen. For example, TerrierTail mitigates the tail latency of a Memcached server by up to 53 percent and an RPC server by up to 50 percent at 99.9th percentile. |
---|---|
AbstractList | Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the responsiveness. Even a temporary spike in latency of any service component can notably inflate the end-to-end delay; therefore, the tail of the latency distribution of service components has become a subject of intensive research. The key characteristics of clouds such as elasticity and on-demand resource provisioning have made clouds attractive for hosting large-scale online services wherein VMs are the building blocks of services. However, adherence to traditional hypervisor scheduling policies has led to unpredictable CPU access latencies for virtual CPUs (vCPUs) that are responsible for performing network IO processes. This has resulted in poor and unpredictable performance for network IO, exacerbating VMs' long tail latencies and discouraging the hosting of large-scale parallel web services on virtualized clouds. This paper presents TerrierTail, a hypervisor CPU scheduler whose primary goal is to trim the tail of the latency distribution of individual VMs in virtualized clouds. In TerrierTail, we have modified the network driver to identify vCPUs that are responsible for performing network IO processes. Leveraging this information, the TerrierTail scheduler mitigates the CPU access latencies of such vCPUs using novel scheduling policies, resulting in a higher and more predictable network IO performance and therefore lower tail latency. TerrierTail's gains come at no measurable negative impacts on other performance attributes (e.g., fairness) or on the performance of VMs running other types of workloads (e.g., CPU-intensive VMs). A prototype implementation of TerrierTail in the Xen hypervisor substantially outperforms the default Credit scheduler of Xen. For example, TerrierTail mitigates the tail latency of a Memcached server by up to 53 percent and an RPC server by up to 50 percent at 99.9th percentile. |
Author | Asyabi, Esmail SanaeeKohroudi, SeyedAlireza Bestavros, Azer Sharifi, Mohsen |
Author_xml | – sequence: 1 givenname: Esmail orcidid: 0000-0003-3616-4819 surname: Asyabi fullname: Asyabi, Esmail email: easyabi@bu.edu organization: Computer Science Department, Boston University, Boston, MA – sequence: 2 givenname: SeyedAlireza orcidid: 0000-0001-6461-1650 surname: SanaeeKohroudi fullname: SanaeeKohroudi, SeyedAlireza email: sarsanaee@comp.iust.ac.ir organization: School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran – sequence: 3 givenname: Mohsen surname: Sharifi fullname: Sharifi, Mohsen email: msharifi@iust.ac.ir organization: School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran – sequence: 4 givenname: Azer orcidid: 0000-0003-0798-8835 surname: Bestavros fullname: Bestavros, Azer email: best@bu.edu organization: Computer Science Department, Boston University, Boston, MA |
BookMark | eNo9kF1LwzAUhoNMcJv-APGm4HVnTtKkqReCzE_YULB6G7L0dGbUdibtxf69LR1enZfD854Dz4xM6qZGQi6BLgBodpO_P3wsGAW1YIqlNBUnZApCqJiB4pM-00TEGYPsjMxC2FEKiaDJlNzl6L1DnxtX3UZr17qtaV29jYZFtDIt1vYQNWW0rJquiL6cbztTRWtjv12N4ZyclqYKeHGcc_L59JgvX-LV2_Pr8n4VW5bxNraGKuTcpgUKtEDlpiiNpdYmaBG5kJwjs0YJUBssuUwLI0op-24GkEnkc3I93t375rfD0Opd0_m6f6kZQApSZpL1FIyU9U0IHku99-7H-IMGqgdNetCkB036qKnvXI0dh4j_vOJcUaX4HzhxZQw |
CODEN | ITDSEO |
CitedBy_id | crossref_primary_10_1109_ACCESS_2022_3187731 crossref_primary_10_3390_electronics9122107 crossref_primary_10_1002_cpe_7196 crossref_primary_10_1109_TWC_2023_3239531 |
Cites_doi | 10.1145/3132747.3132780 10.1109/TPDS.2017.2706268 10.1145/3064176.3064189 10.1145/2670979.2671008 10.1145/1165389.945462 10.1145/2678373.2665718 10.1007/s10586-016-0541-5 10.1145/2885497 10.1145/2830772.2830779 10.1016/j.future.2018.01.015 10.1145/3132747.3132763 10.1145/2670979.2671006 10.1145/3132747.3132762 10.1109/BigDataCongress.2016.13 10.1145/2408776.2408794 10.1145/1346256.1346258 10.1145/1508293.1508308 10.1145/2287076.2287080 10.1145/3054742 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TPDS.2018.2827075 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-2183 |
EndPage | 2359 |
ExternalDocumentID | 10_1109_TPDS_2018_2827075 8338088 |
Genre | orig-research |
GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACIWK AENEX AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIC RIE RIG RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c293t-ca08e33c7de5ec106bdfac0cc4ecee35633e2ca8518bef367da5f6629391196e3 |
IEDL.DBID | RIE |
ISSN | 1045-9219 |
IngestDate | Thu Oct 10 17:59:18 EDT 2024 Fri Aug 23 00:58:45 EDT 2024 Wed Jun 26 19:28:27 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 10 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c293t-ca08e33c7de5ec106bdfac0cc4ecee35633e2ca8518bef367da5f6629391196e3 |
ORCID | 0000-0003-3616-4819 0000-0001-6461-1650 0000-0003-0798-8835 |
PQID | 2117166962 |
PQPubID | 85437 |
PageCount | 14 |
ParticipantIDs | ieee_primary_8338088 crossref_primary_10_1109_TPDS_2018_2827075 proquest_journals_2117166962 |
PublicationCentury | 2000 |
PublicationDate | 2018-10-01 |
PublicationDateYYYYMMDD | 2018-10-01 |
PublicationDate_xml | – month: 10 year: 2018 text: 2018-10-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on parallel and distributed systems |
PublicationTitleAbbrev | TPDS |
PublicationYear | 2018 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref35 ref12 liu (ref34) 2014 ref15 ref36 ref14 ref30 ref33 xu (ref2) 2013 ref10 wen (ref32) 2015 ref1 cheng (ref24) 2012 ref18 belay (ref31) 2014 (ref13) 2018 suo (ref16) 2016 lo (ref9) 2014; 42 ref26 leverich (ref4) 2014 ref25 li (ref3) 2014 li (ref5) 2016 zhang (ref23) 2016 xu (ref6) 2013 ref28 ref27 xu (ref11) 2013 ref29 ref8 kopytov (ref20) 2018 ref7 lozi (ref17) 2016 vamanan (ref22) 2015 carraway (ref19) 2018 norcott (ref21) 2018 |
References_xml | – ident: ref30 doi: 10.1145/3132747.3132780 – start-page: 1 year: 2013 ident: ref6 article-title: Small is better: Avoiding latency traps in virtualized data centers publication-title: Proc 4th Ann Symp Cloud Computing contributor: fullname: xu – year: 2018 ident: ref21 article-title: IOzone filesystem benchmark contributor: fullname: norcott – ident: ref35 doi: 10.1109/TPDS.2017.2706268 – year: 2018 ident: ref13 – ident: ref29 doi: 10.1145/3064176.3064189 – start-page: 329 year: 2013 ident: ref2 article-title: Bobtail: Avoiding long tails in the cloud publication-title: Proc 10th USENIX Conf Netw Syst Des Implementation contributor: fullname: xu – year: 2014 ident: ref34 article-title: RepNet: Cutting tail latency in data center networks with flow replication contributor: fullname: liu – ident: ref33 doi: 10.1145/2670979.2671008 – ident: ref15 doi: 10.1145/1165389.945462 – start-page: 1 year: 2016 ident: ref17 article-title: The Linux scheduler: A decade of wasted cores publication-title: Proc 11th Eur Conf Comput Syst contributor: fullname: lozi – volume: 42 start-page: 301 year: 2014 ident: ref9 article-title: Towards energy proportionality for large-scale latency-critical workloads publication-title: SIGARCH Comput Archit News doi: 10.1145/2678373.2665718 contributor: fullname: lo – ident: ref25 doi: 10.1007/s10586-016-0541-5 – ident: ref8 doi: 10.1145/2885497 – start-page: 585 year: 2015 ident: ref22 article-title: TimeTrader: Exploiting latency tail to save datacenter energy for online search publication-title: Proc 48th Annu IEEE/ACM Int Symp Microarchit doi: 10.1145/2830772.2830779 contributor: fullname: vamanan – ident: ref26 doi: 10.1016/j.future.2018.01.015 – start-page: 1 year: 2016 ident: ref16 article-title: Time capsule: Tracing packet latency across different layers in virtualized systems publication-title: Proc ACM SIGOPS Asia-Pacific Workshop Syst contributor: fullname: suo – ident: ref7 doi: 10.1145/3132747.3132763 – start-page: 317 year: 2015 ident: ref32 article-title: Less can be more: Micro-managing VMs in Amazon EC2 publication-title: Proc IEEE 8th Int Conf Cloud Comput contributor: fullname: wen – start-page: 456 year: 2016 ident: ref23 article-title: Treadmill: Attributing the source of tail latency through precise load testing and statistical inference publication-title: Proc ACM/IEEE Annu Int Symp Comput Archit contributor: fullname: zhang – start-page: 243 year: 2013 ident: ref11 article-title: vTurbo: Accelerating virtual machine I/O processing using designated turbo-sliced core publication-title: Proc USENIX Conf Annu Tech Conf contributor: fullname: xu – ident: ref27 doi: 10.1145/2670979.2671006 – start-page: 1 year: 2012 ident: ref24 article-title: vBalance: Using interrupt load balance to improve I/O performance for SMP virtual machines publication-title: Proc ACM Symp Cloud Computing contributor: fullname: cheng – ident: ref28 doi: 10.1145/3132747.3132762 – year: 2018 ident: ref20 article-title: SysBench: A scriptable database and system performance benchmark contributor: fullname: kopytov – ident: ref18 doi: 10.1109/BigDataCongress.2016.13 – ident: ref1 doi: 10.1145/2408776.2408794 – start-page: 49 year: 2014 ident: ref31 article-title: IX: A protected dataplane operating system for high throughput and low latency publication-title: Proc 11th USENIX Conf Operating Syst Des Implementation contributor: fullname: belay – ident: ref14 doi: 10.1145/1346256.1346258 – start-page: 1 year: 2016 ident: ref5 article-title: PSLO: Enforcing the Xth percentile latency and throughput SLOs for consolidated VM storage publication-title: Proc 11th Eur Conf Comput Syst contributor: fullname: li – ident: ref12 doi: 10.1145/1508293.1508308 – start-page: 1 year: 2014 ident: ref3 article-title: Tales of the tail: Hardware, OS, and application-level sources of tail latency publication-title: Proc ACM Symp Cloud Comput contributor: fullname: li – ident: ref10 doi: 10.1145/2287076.2287080 – start-page: 1 year: 2014 ident: ref4 article-title: Reconciling high server utilization and sub-millisecond quality-of-service publication-title: Proc 9th EUR Conf Comput contributor: fullname: leverich – ident: ref36 doi: 10.1145/3054742 – year: 2018 ident: ref19 article-title: Lookbusy: A synthetic load generator contributor: fullname: carraway |
SSID | ssj0014504 |
Score | 2.3508196 |
Snippet | Large-scale online services parallelize sub-operations of a user's request across a large number of physical machines (service components) so as to enhance the... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 2346 |
SubjectTerms | Boosting Central processing units Cloud computing CPU scheduler CPUs Delays Elasticity Policies Provisioning Resource allocation Scheduling Servers tail latency Task analysis Virtual environments Virtual machine monitors virtualization Web services Xen |
Title | TerrierTail: Mitigating Tail Latency of Cloud Virtual Machines |
URI | https://ieeexplore.ieee.org/document/8338088 https://www.proquest.com/docview/2117166962 |
Volume | 29 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fT8IwEL4AT_ogChpRNH3wyTgYdOtWH0wMSogRYyIY3pauvRkiAcOPB_3rvW6DEPXBt2ZZl6bX6923--4O4MJDbtwg4E5gOHc81UqcUPjG0W3NY1RtMuo2Ubj_JHpD72HkjwpwtcmFQcSUfIYNO0xj-WamV_ZXWTMkPEVaUYRiIGWWq7WJGHh-2iqQ0IXvSFLDPILZcmVz8Hz3YklcYYPwReBaSuGWDUqbqvy6iVPz0i1Df72wjFXy3lgt44b--lGz8b8r34e93M9kt9nBOIACTitQXvdwYLlKV2B3qyBhFW4GtlAjzgdqPLlm_XFWgGP6xuwD9qisg_3JZgnrTGYrw17Hc5t-wvopIxMXhzDs3g86PSfvsECykHzpaOWGyLkODPqoCR3GJlHa1dpDMp7cF5xjWyvyysIYEy4Co_xECJpLd6QUyI-gNJ1N8RiYcZVWMjSuSQw5JWTkNCHvwISerziNanC53vPoIyukEaUAxJWRFVBkBRTlAqpB1e7h5sV8-2pQX0spylVtERGCJcwnpGif_D3rFHbstzMGXh1Ky_kKz8iTWMbn6RH6BqHQxNc |
link.rule.ids | 315,783,787,799,27936,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFH_oPKgHp1NxOjUHT2K3urRp60EQdUxdh2CV3UqavMpQNtnHQf96X9puDPXgLZSGhry8vN-v7wvgxEGubc_jlqc5txx5nlq-cLWlmoonKJtk1E2icNgV7Wfnvuf2luBsnguDiFnwGdbNMPPl66Gaml9lDZ_4FGnFMqwQrvZFnq019xk4btYskPiFawWkiIUP89wOGtHjzZMJ4_LrxDA82wQVLlihrK3Kr7s4MzCtMoSzpeVxJW_16SSpq68fVRv_u_ZN2CiQJrvKj8YWLOGgAuVZFwdWKHUF1hdKEm7DZWRKNeIokv33Cxb28xIcg1dmHrCONBD7kw1Tdv0-nGr20h-ZBBQWZjGZON6B59ZtdN22ih4LJI2ATywlbR85V55GFxXxw0SnUtlKOUjmk7uCc2wqSbjMTzDlwtPSTYWguXRLBgL5LpQGwwHuAdO2VDLwta1TTbCEzJwi7u1p33Elp1EVTmd7Hn_kpTTijILYQWwEFBsBxYWAqrBt9nD-YrF9VajNpBQXyjaOicMS6xOBaO7_PesYVttR2Ik7d92HA1gz38nj8WpQmoymeEi4YpIcZcfpGxl5yCI |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TerrierTail%3A+Mitigating+Tail+Latency+of+Cloud+Virtual+Machines&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Esmail+Asyabi&rft.au=SanaeeKohroudi%2C+SeyedAlireza&rft.au=Sharifi%2C+Mohsen&rft.au=Bestavros%2C+Azer&rft.date=2018-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=29&rft.issue=10&rft.spage=2346&rft_id=info:doi/10.1109%2FTPDS.2018.2827075&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |