Transactions and strongly consistent group replication solve
different problems. We need both.
A quick summary of this whole set of postings:
·
Transactions, and in particular,
transactional key-value stores, are a genuinely exciting technology for sharing
information between tasks that are mutually
anonymous. By this, I mean tasks A
and B that update and share information about some sort of external reality,
but where A has no direct knowledge of B and vice versa: they are not directly
coordinating their actions, or running as two parts of some overarching
structured application.
o
The ACID properties make sense
for this use case.
o
The model of lightweight tasks
that use the transactional memory as their sole persistent store is very
natural for the cloud, so we see this approach in PaaS solutions.
o
Often we say that the tasks are
stateless, meaning that they always are launched in whatever the original state
defined by the virtual machine or the code is; state is not kept from
activation to activation (except in the transactional store, where state
persists).
·
Group replication is a great
model for high availability programming.
You design with a state machine style of execution in mind, although
generally the rigid lock-step behavior of a state machine occurs only in
replicated objects that might be just a small subset of the functionality of
the overall application.
o
For example, a modern
application might have a load-balancer, a cache that is further subdivided into
shards using a key-value model, and then each of the shards might be considered
as a state machine replicated group for updates, but its members might operate
independently for RPC-style read-only operations (to get read
parallelism). Then perhaps the same
application might even have a stateful persistent backend.
o
We use virtually synchronous
atomic multicast and virtually synchronous Paxos in such applications: virtual
synchrony ensures consistency of the membership updates, and then the atomic
multicast or Paxos protocol handles delivery of messages or state updates
within the group.
·
In today’s cloud we might want
both functionalities, side by side.
o
The virtual synchrony group
mechanisms make it easy to build a scalable fault-tolerant distributed program
that will be highly available, responding very rapidly to external
stimuli. You would want such a program
in an Internet of Things setting, to give one example.
o
But if you run such a
replicated program many times, separately, the program as a whole is quite
similar to the tasks mentioned in the first bullet: each of these
multi-component programs is like a task.
And we might do that for a setting like a cloud-hosted controller for a
smart car, where we want high availability and speed (hence the group
structure), but also prefer to have one of these applications (replication and
all) per car. So we could launch a half
million of them on the same cloud, if we were interested in controlling the
half million or so cars driving on the California state highway network on a
typical afternoon.
o
These group-structured programs
might then use a transactional key-value store to share data between instances
(in effect, the car-to-car dialog would be through a shared transactional
store, while the single-car-control aspect would be within the group).
Here’s a picture illustrating a
group-structured program like the one just mentioned:
In this example we actually see even more
groups than were mentioned above: the back-end is sharded too, and then there
are groups shown that might be used by the back-end to notify the cache layer
when updates occur, as is needed in a so-called coherently cached, strongly
consistent store. Paxos would be useful
in the back-end, whereas the cache layer would be stateless and hence only
needs atomic multicast (which is much faster).
Applications like this are easy to build with Derecho, and they run at
the full speed of your RDMA network, if you have RDMA deployed.
The point then is that replication is used
here to ensure rapid failure recovery of the application instance (this is the
data in the back-end store), and for scaling to handle high query rates (the
replication in the cache layer). But this
form of replication is entirely focused on local state: the information needed
to control one car, for example.
I tried to make a picture with a few of
these structured applications sharing information through a key-value store,
but it gets a little complex if I show more than two. So here are two of them: task A on the left
and task B on the right. The
transactional DHT in the middle could be FaRM or Herd:
Notice the distinction: local state was
replicated in each of task A and task B (handling two different vehicles), but
the global state that defines the full state of the world, as it were, is
stored in the transactional key-value database that they share. This matters: processes running within A or within B know of the structure of the
overall car-manager task in which they run.
Subtasks of a single task actually know about one-another and cooperate
actively. In contrast, when we consider
task A as a unit, and task B as a unit, the picture changes: A does not really know that B is active and
they do not normally cooperate directly. Instead, they share data through an the DHT:
the transactional key-value subsystem.
As in the case of RDMA, there is a lot more to be said here. I've posted sub-topic remarks on a few of the points that I find most interesting.
No comments:
Post a Comment
This blog is inactive as of early in 2020. Comments have been disabled, and will be rejected as spam.
Note: only a member of this blog may post a comment.