A Few Thoughts on Distributed Computing: August 2020

Alice and Bob, two star-crossed lovers, had planned to meet for lunch in downtown Palo Alto if the weather was good, this to be determined by Alice, who would email Bob, who then would confirm (being a late-riser, there was some risk of him not actually seeing an email in time). But they never actually did meet, and indeed, Alice never once sent an email -- despite the fact that Northern California had a near record of three months without a single cloudy day! When those emails didn't arrive, Bob realized that he couldn't imagine life without her.

By now you've figured out the puzzle, if you tried. The core problem is uncertainty: When Alice emails Bob "I just checked, nice weather again (no surprise...). How about Vegan Burger at 12:15?", she has no certainty Bob will see the email in time to meet her. Bob, who woke up early that morning, replies instantly: "Can't wait!". But did Alice see his reply? Alice replies to Bob: "It will be so nice to see you again...". And so it goes... with neither ever reaching a state of certainty. So, realizing this, Alice didn't waste the bits on a pointless, unending exchange. Pankaj Mehra wins the prize for being first to post the correct answer!

In fact the deeper puzzle is precisely the opposite of the one I posed. The question that we really need to ask is "Why, then, do lovers ever manage to meet for lunch? And how can distributed systems reach agreement, on anything?"

Yoram Moses and Joe Halpern formalized this scenario using what came to be known as a "knowledge logic." You start with a normal logic, but have a set of participants (Alice, Bob, ...) and a way for them to exchange messages, which are generally treated as reliable, but with unbounded delays. If predicate P is true for Alice, we say that Alice "knows" P: K(P)Alice. Now we can write K(K(P)Alice)Bob to denote that Bob knows that Alice knows P. For example, Bob learns that Alice knows that the weather is fine. The problem is that at the moment Bob learns this (transitions from not knowing to knowing), Alice does not know that Bob knows that Alice knows that the weather is fine. When Bob emails his confirmation, the same issue arises but with their roles reversed: now Alice knows that Bob knows that the weather is nice, but Bob doesn't know that Alice saw the confirmation, hence for him, the situation is actually unchanged. This knowledge asymmetry is the root of their dilemma.

Last week I suggested that anyone puzzled by my puzzle watch The Princess Bride. Early in the movie there is a fantastic scene structured around this exact question. The criminal Vizzini -- who happens to be the smartest man in the world (Wallace Shawn) --engages in a duel to the death with Westley, who is the current incarnation of the Dread Pirate Robert (Cary Elwes). The duel centers on deciding which of two cups of wine has been poisoned. Vizzini debates himself until, after some number of back-and-forth ripostes, he suddenly concludes "but Westley certainly isn’t smart enough to have realized this!" He seizes one of the cups and drinks it, while Westley drinks the other. Shawn dies... but was his logic wrong?

The logic of knowledge is quite elaborate, and can express all sorts of statements about distributed systems and multi-party reasons. It also comes with some theorems, including the following. To get rid of all the names, let's say that Bob has "K1" knowledge of P if Bob knows that someone knows (or some set of people) P. So if Bob knows that Alice knows the weather, he has K1 knowledge (Alice has K0 knowledge). Once Alice reads Bob's first acknowledgement, she has K2 knowledge, and so forth.

The bizarre aspect of our lover's dilemma is that both Alice and Bob actually do know that the weather will be fine -- and both of them quickly learn that both of them know this to be true. Both, that is, transition through a K1 knowledge state. Yet the particular (peculiar, one might say) setup is such that the weather being fine is not enough. To actually meet, both must be sure that both will be there. This is the aspect over which the plan stumbles.

K* knowledge, also called "Common Knowledge", is defined to be a level of knowledge that implies Kn for all values of n. Yoram and Joe proved in their papers that common knowledge cannot be achieved in a symmetric system where all the participants have equal trust in one-another and communicate via messages. Alice and Bob, in effect, agreed to require common knowledge, not realizing that with uncertain communication delays, this wouldn't be possible.

A curious feature of common knowledge is that if we simply introduce a special participant who is trusted to disseminate common knowledge, a system can simply accept the statements of this omnipotent participant and achieve things that, lacking such a participant, would be infeasible.

Here's a famous example that I often use when teaching the idea. Alice and Bob, now a couple with children, have organized a birthday party for their daughter Carol. The kids come in from the yard, where they have been playing and on their way in, they wash up... but poorly. None washes their face, and as a result all have mud on their faces, but nowhere else. So Bob announces "in our house, you need to be clean to sit at the table. So if you want cake, you need a clean face!". No child loves to wash twice (and children are like cats where water on their faces is concerned!). Not one budges. Bob, increasingly frustrated, repeats this a few times. Dessert is in peril!

Meanwhile, in a different reality, Bob said something slightly differently: when the kids came in, his first remark was "Wow, I see a dirty face!" All of the children hear him clearly, and of course, young children place absolute truth in adults. In this situation, with N children, all march to the sink the N'th time Bob repeats the rule.

The oddity in all of these stories is that everyone seemingly knows the answer, yet they behave very differently depending on how common knowledge is handled. Alice knows the weather is fine. When she gets Bob's reply, he knows this too, and she is sure he knows -- and if she takes the next step and replies to his reply, he knows that she knows this. Clearly, this should be adequate for a nice veggie burger special!

Or consider the children: every child sees mud on every face (except of course, their own), so on Bob's very first announcement, why didn't all of them head for the sink? But in fact, in version two, they do head for the sink.

In reality two, Bob's introduction of common knowledge seems to be of vital importance. But common knowledge is indefinitely elusive! You can construct N-child proofs that reduce the N-child case to the N-1 child case, and the crux of the proof turns out to be similar to the dilemma of Bob and Alice: with 2 children, perhaps Carol and Doug, you can play this out. In reality one, Carol is hoping against hope that her face is clean (and similarly, Doug). She sees mud on Doug's face, he sees mud on hers, so neither budges. But Carol is hoping Doug sees a clean face, and Doug is hoping she is looking at his clean face. Lacking certainty that at least one child has a dirty face, neither moves. When Bob introduces that common knowledge, it changes the game, and both can deduce that their own face, tragically, must be muddy. How can we avoid making the mistake of requiring Common Knowledge in a problem statement?

We often build systems that need to cooperate, elect leaders and so forth. We certainly use a logic of knowledge when designing them; the resulting built-in facts are, in effect, common knowledge. But the dynamic form of knowledge associated with new events is the more interesting case. Why is it that we can solve problems like implementing state machine replication using virtually synchronous Paxos (the consistency model and protocol employed in Derecho)?

Do we think of Derecho as a trusted source of common knowledge, or is it simply that these kinds of problems don't actually need common knowledge?

For me, the really fascinating insight is that when a system manages its own membership, creating epochs of stable membership, state machine replication seems to be a complete story for everything we ever need to do in distributed systems: if we need consistency, this model offers the answers; if we don't need consistency -- well, then use TCP and you should be happy! Moreover, as far as I can tell, these state machine replication protocols use the membership as a form of common knowledge. This suggests that a system like Derecho is actually a two-level knowledge structure, with one structure subordinate to and trusting the other, and that higher level structure permitted to hand common knowledge statements to the lower one!

This, however, is not something I have ever seen treated in a paper. Perhaps someone has looked at it, but if so, it didn't cross my desk!

The other side of this story, equally fascinating, is that K1 knowledge is sufficient both for self-managed membership and for tasks such as ordered multicast delivery with safety in the event of a crash: In the Derecho paper I linked to earlier, we discuss this and present our reduction of the protocols to a knowledge logic and out implementation over 1-sided RDMA. So, in this two-level formulation, we seem to have a K1 protocol at each level, but with one level viewed as a source of common knowledge by the other level!

Back in the real world, there is a third aspect to the same insight. If you reflect on it, Bob and Alice's dilemma "made sense" and does seem to capture the way we describe situations and problems. Yet the K1 style of solution used in real systems also seems relevant: in real life, Bob and Alice would never have gotten into such a jam -- Alice would have emailed that the weather is fine, Bob would have acknowledged. His iphone would show a little check mark: sent. Then a second one: read by Alice (or more accurately, it popped up on her screen and she swiped it away). Thus in the real world, Alice and Bob make a kind of leap of faith and begin to treat the weather as common knowledge.

This idea of a leap of faith is really about probabilistic reasoning: we simply decide, at some point, that the likelihood of a problem is so low that it can be dismissed. Logicians (including both Joe and Yoram) have looked at this issue, and shown that it has a great deal of relevance to AI. But I'm not actually convinced that Derecho involves such a leap of faith.

If anyone knows of work on this question, I would love to hear about it! And in the meanwhile, stay safe and enjoy the rest of your summer, happy in the knowledge that it all worked out for Alice and Bob.

A Few Thoughts on Distributed Computing

Friday, 21 August 2020

Knowledge logics (and love)

Wednesday, 12 August 2020

A little puzzle to amuse you on a summer evening