Tuesday 25 September 2018

Blockchain verification and the dangers of branching temporal logic models (very technical)

Many years ago, I worked with some colleagues on an ill-fated topic: we tried to write down a logical statement concerning the guarantees provided by atomic multicast systems that manage their own membership.  Today, we know how to do that, courtesy of Lamport’s Paxos specification and the proof methods he introduced.

But those were the Wild West days, and that particular project occurred before the Paxos specification was created.  Moreover, our atomic multicast (which actually could be configured as a Paxos protocol), also included some optional features Leslie omitted from Paxos, for good reason.  Those centered on a form of optimistic early delivery combined with barrier synchronization (analogous to a memory fence).

Our puzzle centered on expressions describing "all the possible" future behaviors that could arise with this rather complex optimistic form of early delivery.  The problem was that the set of future scenarios grew exponentially as new members join and existing members failed (or exited voluntarily).  Our logic needed to "account" for all of these possibilities.  In fact, the logic itself had a flaw, but even had we managed to write it down properly, we still would have had a semantic in which statements can only be interpreted within the “space” of future scenarios.  Since it is intractable to do model checking in an exponentially growing state space, such statements are often undecideable: they can have a true or false value and yet no decision procedure can be created that will terminate in bounded time.

A total mess.  We ultimately abandoned the effort, tails between our legs, and came to view it as an embarrassment.  The less said about it, the better!

Except... I sometimes tell the story, as a cautionary tale.

Not every technology suffers from the issue we encountered.  Lamport's way of formalizing Paxos avoids the issue by avoiding speculative "early delivery", which was the real issue in our early work.  This is one reason that Lamport's Paxos specification was such a success.

Transactional database models also have a better way of handling such problems: when we ask whether a database system is in a serializable state, the rule is to start by erasing all the uncommitted transactions, at which point serializability is defined as a property of the committed state.  This approach accepts that transactions could glimpse inconsistent states while executing: it isn't a problem so long as those transactions can't commit.  Moreover, it erases all the events that depend on future outcomes, neatly avoiding the whole issue our unfortunate effort ran up against.

Which brings us to BlockChain.  I'm intrigued by the recent work that seeks to put a kind of transactional database behavior "into" the BlockChain, by incorporating SQL-like statements into the transactions themselves, but then reevaluating them as the BlockChain steadily grows.

To appreciate why this poses the same problem I struggled with twenty years ago, think about a smart contract that says something along the following lines: "John agrees to sell me his horse, Bucky, for the sum of $1,500, and has accepted a $150 deposit.  If I haven't completed the purchase within a week, John agrees to return the deposit.  But in the meanwhile, John can continue to try and sell Bucky.  If he finds another buyer, he can cancel my transaction, but in that case must both return the deposit and also pay me an addition $100, to compensate me for my trouble."

The world is full of contracts like these.   Smart contracts can express things like rules for computing interest that depend on global interest rates.   We probably all remember 2008, when the world financial system melted down over issues with mortgage-backed  securities split into interest and principle.  The claim is that the expressive power of smart contracts is a good thing, because smart contracts can be analyzed by tools (compiler-style tools), and hence it should be possible to automatically identify risks. Risk managers would then have robust ways to compute their risk and hedge against it, and those hedging contracts (insurance deals) could also be carefully evaluated, etc.  Back in 2008, it wasn't the mortgage-backed securities that collapsed, per-se.  It was the insurance companies that insured them, but didn't hedge the "secondary" risk properly.

So... how does one attach a "formal meaning" to a smart contract?  Let's go back to John's sale of Bucky.  Notice that this contract depends on how things play out into the future.  For the coming week, the BlockChain will grow, and each new block added to the chain could bring events relevant to the contract.  John could decide to cancel the contract and sign a new contract with Sally (perhaps she is ready to pay more -- enough to more than compensate John for the $100 he'll forfeit).   I could show up with the remaining $1350, and head home with Bucky.  A week could pass, and John would have to return my $150 deposit.  And it gets worse: John could sign with Sally, but then Sally might cancel her deal, and perhaps John would then want to reinstate his deal with me.

Much as in that early work I tried to do, John's smart contract with me has a meaning that can depend on a branching future state: some large (maybe exponential) space of possibilities, each leading to its own proper interpretation of the correct "thing to do".  Should John hand Bucky over to me, or not?  Do I owe him $1,350, or does he owe me $150, or should it be $250?

Without much trouble, we can design sequences of smart contracts in which to know the proper outcome for my contract, I need to figure out the outcome of Sally's contract (and this then becomes an induction, because Sally's contract may depend on the outcome of Carl's contract).  This is precisely how my early work failed: you end up with scenarios that can be arbitrarily prolonged, and the total space of scenarios grows exponentially in the length of the future chain,  because of an endlessly longer sequence of new events that each depends on its own future outcomes.

Beyond all of which we have the issue of rollbacks: even if you accept the common wisdom and adopt the view that a BlockChain prefix has magically "committed" once it has been extended by six or more blocks, we still run into the problem that the suffix is unstable.  So we could have one suffix in which Sally's transaction finalizes, but it might then rollback, aborting that outcome and perhaps replacing it with one in which Sally cancels her purchase.

Should it trouble us that smart contracts on BlockChains might not have a tractable meaning -- a reduction to temporal logic -- if they include future references?  For that matter, even without future references, some of the aspects just mentioned would still arise.   Is this bad?

I think so: it seems to me that in computing, if we're learned one thing over the decades, it is the overarching importance of rigorous semantics.  If BlockChains with smart contracts can't be reduced to a stable logical framework in which proofs can be carried out without solving infeasible problems (evaluating logical formulas within exponentially growing state spaces is a well-known infeasible problem), then we are looking at a profoundly untrustworthy framework.

So beware, all of you rabid BlockChain investors!  If you are betting big on smart contracts, you owe it to yourselves to figure out a way to reduce the statements those contracts make to a stable, computationally feasible form.  You know what they say: those who fail to learn from the past are doomed to repeat it.  If you don't find a robust and tractable semantics for your technology, then someday, twenty years from you, too will be writing little blog postings about how your work once took a hopelessly wrong turn...  and that Professor Birman's sad story of his unfortunate foray into the theory of branching future executions should have warned you!

2 comments:

  1. It seems to me the problem is we actually need Temporal Logic or maybe Computational Tree Logic to formalize the semantics of smart contracts. However this is done very informally nowadays such as "If I call this method before timeout then it means I abort or else ..." which poses a risk on everyone who relies on similar informal reasonings.

    ReplyDelete
    Replies
    1. Right, but the point I'm making is that no matter how you express the properties you want to claim that your solution has, you may be trying to verify (model-check, for example) those properties in a configuration space that grows exponentially rapidly as function of how far into the future we need to consider. So if a BlockChain record has some form of smart contract that might not finalize for some long period into the future (maybe in some cases, never), your logic challenge involves proving that the contract has the desired properties in each of these states.

      As a general matter, one way to pose an undecidable question is to ask us to "evaluate something" in an exponentially large space of possibilities. No matter how fast your computer, the set of possibilities will grow faster than you have compute cycles to carry out the test. So you end up with an effectively undecidable problem. It becomes impossible to say whether the property would hold, or not.

      How could one evade this seeming barrier?

      One option is to bound how far into the future a smart contract could hold. Then you only need to ask what its properties would be for some boundedly long extension of the BlockChain. The exponential growth of the number of cases to consider will be capped.

      Another is to be able to reason in some general way: "in all possible futures, X will be greater than 17. Accordingly..." This style of reasoning can sometimes cover any possible future state, so you don't need to look at them case by case.

      But BlockChain smart contract languages seem a bit too clever and too flexible for us to safely assume that contracts would have these kinds of simplifying properties. And in the general case, it won't matter what logic you use. The verification task will still be intractable.

      Delete

This blog is inactive as of early in 2020. Comments have been disabled, and will be rejected as spam.

Note: only a member of this blog may post a comment.