Leader Set Selection for Low-Latency Geo-Replicated State Machine

Modern planetary scale distributed systems largely rely on a State Machine Replication protocol to keep their service reliable, yet it comes with a specific challenge: latency, bounded by the speed of light. In particular, clients of a single-leader protocol, such as Paxos, must communicate with the...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on parallel and distributed systems Vol. 28; no. 7; pp. 1933 - 1946
Main Authors	Liu, Shengyun, Vukolic, Marko
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Commutation Computer crashes Computer networks Delays geo-replication Gold Indexes latency optimization Leadership Partitions Protocol Protocols Reliability Replication State machine replication State machines State of the art state-partitioning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Modern planetary scale distributed systems largely rely on a State Machine Replication protocol to keep their service reliable, yet it comes with a specific challenge: latency, bounded by the speed of light. In particular, clients of a single-leader protocol, such as Paxos, must communicate with the leader which must in turn communicate with other replicas: inappropriate selection of a leader may result in unnecessary round-trips across the globe. To cope with this limitation, several all-leader and leaderless alternatives have been proposed recently. Unfortunately, none of them fits all circumstances. In this article we argue that the "right" choice of the number of leaders depends on a given replica configuration and the workload. Then we present <inline-formula><tex-math notation="LaTeX">{\mathsf {Droopy}}</tex-math> <inline-graphic xlink:href="liu-ieq1-2636148.gif"/> </inline-formula> and <inline-formula> <tex-math notation="LaTeX">{\mathsf {Dripple}}</tex-math> <inline-graphic xlink:href="liu-ieq2-2636148.gif"/> </inline-formula>, two sister approaches built upon state machine replication protocols. <inline-formula><tex-math notation="LaTeX">{\mathsf {Droopy}}</tex-math> <inline-graphic xlink:href="liu-ieq3-2636148.gif"/> </inline-formula> dynamically reconfigures the set of leaders. Whereas, <inline-formula><tex-math notation="LaTeX">{\mathsf {Dripple}}</tex-math> <inline-graphic xlink:href="liu-ieq4-2636148.gif"/> </inline-formula> coordinates state partitions wisely, so that each partition can be reconfigured (by <inline-formula><tex-math notation="LaTeX">{\mathsf {Droopy}}</tex-math> <inline-graphic xlink:href="liu-ieq5-2636148.gif"/> </inline-formula> ) separately. Our experimental evaluation on Amazon EC2 shows that, <inline-formula><tex-math notation="LaTeX"> {\mathsf {Droopy}}</tex-math> <inline-graphic xlink:href="liu-ieq6-2636148.gif"/> </inline-formula> and <inline-formula><tex-math notation="LaTeX">{\mathsf {Dripple}}</tex-math> <inline-graphic xlink:href="liu-ieq7-2636148.gif"/> </inline-formula> reduce latency under imbalanced or localized workloads, compared to their native protocol. When most requests are non-commutative, our approaches do not affect the performance of their native protocol and both outperform a state-of-the-art leaderless protocol.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2016.2636148