A Few Thoughts on Distributed Computing: December 2018

There is an old joke that goes like this: A tourist gets lost and then sees a farmer, so he stops to ask directions. The farmer hems and haws and finally says that "son, I'm sorry, but you just can't get to there from here. You may just have to go somewhere else and then try again."

It turns out that there is a situation where this kind of advice might actually make a great deal of sense. A little while back, I had an opportunity to learn about “causal emergence” during a few hours spent with Erik Hoel, a Tufts University professor who is a leading proponent of the concept, at an undergraduate-organized "research and society" symposium at Princeton (quite a nice event).

Suppose that you were provided with a great model describing the quantum behavior of oxygen and hydrogen atoms. In a triumph of scientific computing, you use the model to predict that they combine to form molecules of H2O and even to make new discoveries about how those molecules behave, and how to relate their behavior to their underlying quantum nature.

But can you extrapolate to predict the behavior of a cup of tea, or the steam rising from it? A cup of tea is a very complex thing: a simulation would need to deal with all the interactions between molecules (to say nothing of your half-dissolved teaspoon of sugar and cloud of milk). There is no way you could do it: the emergent structure can't easily be deduced even with an understanding of the underlying system.

Erik and his colleagues are actually focusing on human consciousness, and developing a hypothesis that we won't be able to understand human thought purely in terms of the underlying neural wiring of the brain, or the chemical and electrical signals it uses. They treat the problem as a type of coding question, and argue that the fine-grained details are like noise that can drown out the signal of interest to us, so that no matter how much we learn about the brain, we might still be unable to understand thought.

This got the audience very engaged at the Princeton event: they seemed to really like the the idea that human intellect might somehow be inaccessible to science, or at least to "reductionist" science. Erik, though, mentioned that he doesn't always get a positive reception: there is a scientific community that absolutely hates this work! As he explains it, first, they tend to point to the Greek philosophers and note that Plato and Aristotle came up with this a long time ago. Next, they point out that in computing we have all sorts of impossibility and undecideability results, and that even a basic complexity analysis can lead to similar conclusions. Beyond this, there is a question of whether the concept of layering is even well posed: it is easy to say "I know a cup of tea when I see one", but what, precisely, constitutes a cup? Philosophers adore questions such as this. But.... let's not go there!

Is causal emergence just much fuss about nothing? Not necessarily: there is an aspect of this causal emergence debate that fascinates me. As most people who read this blog would know, distributed systems tend to be built using one of three core concepts -- everything else just puts these together as building blocks:

We use fault-tolerant consensus to implement consistency (the use cases are very broad and include transactions, state machine replication, leader election, primary-backup coordination, locking, system configuration, barrier synchronization, Zookeeper...). Even our more complex models, such as Byzantine Agreement and BlockChain, really come down to consensus with a particularly severe fault model.
We divide to conquer, mostly using key-value sharding. A consensus mechanism can be used to track the configuration of the sharded layer, so the shards themselves are freed to use simpler, cheaper mechanisms: In effect they depend on the consensus layer, but don't need to implement it themselves.
We turn to convergent stochastic mechanisms in situations where a state-machine style of step-by-step behavior isn't applicable (like for the TCP sliding window, or a gossip protocol for tracking membership or loads, or a multi-tier caching policy).

So if you accept this very simplified taxonomy, what jumps out is that in effect, variations on these three basic kinds of building blocks can be used as "generators" for most of modern distributed computing. But are there behaviors that these three building blocks can't enable? What building blocks would be needed to cover "everything"? I think the causal emergence model sheds some light by suggesting that in fact, there may be a new kind of impossibility argument that would lead us to conclude that this question might not have an answer!

But we've always suspected that. For example, one category of behaviors we often worry about in distributed settings are instabilities. I've often written about broadcast storms and data-center wide oscillatory phenomena: these arise when a system somehow manages to have a self-reinforcing load that surges like a wave until it overwhelms various components, triggering episodes of data loss, waves of error-recovery messages, and eventually a total meltdown. We obviously don't really want to see those kinds of things, so designers try to bullet-proof their systems using mechanisms dampen transients. Could stability be in this class of properties that are "hidden" in the low-level details, like Erik's causal emergence scenario?

Then there is a second and more subtle concern. Think about a wave in the ocean that gradually builds up energy in a region experiencing a major storm, but then propagates for thousands of miles under fair skies: some of the immense energy of the storm was transferred to the nascent wave, which then transports that energy over vast distances. Here we have an emergent structure that literally moves, in the sense that the underlying components of the water that it perturbs change as time elapses. The fascination here is that the emergent structure is actually a wave of energy. So we observe the physical wave, and yet we aren't really seeing the energy wave -- we are seeing a phenomenon caused by the energy wave, yet somewhat indirect from it. Similarly, when a data center becomes destabilized, we are often confronted with massive numbers of error messages and component failures, and yet might not have direct visibility into the true "root cause" that underlies them. Causal emergence might suggest that this is inevitable, and that sometimes, the nature of an instability might not be explicable even with complete low-level traces.

This idea that some questions might not lend themselves to formal answers can frustrate people who are overly fond of reductionist styles of science, in which we reduce each thing to a more basic thing. That energy wave can't be directly observed, and in fact if you look closely at the water, it just bobs up and down. The water isn't moving sideways, no matter how the wave might look to an observer.

This same puzzle arises when we teach students about the behavior of electric power grids: we are all familiar with outlets that deliver A/C power, and even children can draw the corresponding sine wave. Yet many people don't realize that the power signal has an imaginary aspect too, called the reactive component of power. This reactive dimension actually emerges from a phenomenon analogous to that water bobbing up and down, and to fully describe it, we model the state of a power line as a signal that "spirals" around the time axis, with a real part and an imaginary part. The familiar A/C signal is just the projection of that complex signal onto the real axis, but the reactive part is just as real -- or just as unreal, since this is simply a descriptive model. The physical system is the conductive wire, the electrons within it (they move back and forth, but just a tiny amount), and the power signal, which is a lot like that wave in the water, although moving a lot faster.

In effect, electricity is an emergent property of electric systems. Electricity itself doesn't have an imaginary dimension, but it is very convenient to model an A/C electric circuit as if it does.

Viewed this way, causal emergence shouldn't elicit much debate at all: it is just a pretext for pointing out that whereas the physical world is shaped by physical phenomena, we often perceive it through higher-level, simplified models. Viewed at the proper resolution, and over the proper time scale, these models can be incredibly effective: think of Newtonian mechanics, or hydraulics, or the electric power equations.

And yet as a person who builds large distributed systems, I find that people often forget these basic insights. For me, and for any distributed systems builder, it can be frustrating to talk with colleagues who have deep understanding of the theories covering small distributed services, but never actually implement software There is a kind of unwarranted hubris that theoreticians sometimes slip into: a belief that their theories are somehow more valid and complete than the real system.

In fact, any builder will tell you that real systems are often far more complex than any theory can model. Those old farmers would understand. Causal emergence potentially offers a rigorous way to back up such a claim.

The usual theoretical riposte is to say "show me the detailed model, and express your goal as an abstract problem, and I will solve it optimally." But not everything that occurs at large scale can be expressed or explained using lower-level models. And this is a deep truth that our community really needs to internalize. If the did, it would (I think) lead to a greater appreciation for the inherent value of high quality engineering, and very detailed experiments. Sometimes, only engineering experience and careful study of real systems under real loads suffices.

A Few Thoughts on Distributed Computing

Wednesday, 12 December 2018

The debate about causal emergence helps explain a tension between distributed systems theory and practice.