Wednesday, 7 December 2016

Some thoughts about anonymity in the BitCoin BlockChain protocol


BitCoin has spawned a whole series of information revolutions: some concerned with digital currency, some more focused around anonymity, and some concerned with the potential for cryptographic block chains.  I thought I might say a few words on the topic.  But I should maybe remark first that I’m not a world expert on this topic, although I’ve read a great deal and talked with people who are doing cutting edge research.  So take these as a few thoughts from someone on the periphery.
 I didn't give much thought to BitCoin for a long time: obviously, I was aware of it, and read about arrests of the folks running Silk Road, which was a dark web site (meaning you accessed it over cryptographically tunneled links) hosting a kind of Amazon.com of illicit substances, prostitution, and apparently even murder for hire.  I won’t summarize the story, but I definitely recommend reading about it.  Anyhow, Silk Road transactions were all denominated in BitCoin, because the currency was reasonably widely available, and offers anonymity.
But I always viewed BitCoin as an interesting application built using a technically uninteresting protocol: we've worked with replicated append-only persistent logging for ages (after all, this is what Paxos is about), and Byzantine versions too (consider the work that Castro and Liskov did at MIT on practical Byzantine replication).   Thus the fact that BitCoin had such a protocol within it was kind of unimportant to me.
But sometimes, a technology takes hold and you are sort of forced to begin to think about it.  In my crowd, people talk about BitCoin quite a lot these days, and are getting interested in BlockChain technology.   So I keep finding myself pulled into discussions about the technology.  The topic I think is worth discussing here isn’t specific to the way BitCoin works, but relates to its BlockChain. 
We can understand Bitcoin in terms of distinct aspects:
  • BitCoin is a digital currency (or commodity) and people buy and sell them, and use them in transactions.  So one dimension is concerned with BitCoin in this role as currency.
  • BitCoin centers on a style of agreement protocol that builds an append-only log of transactions: the BlockChain.  The specific BitCoin log has the difficulty that because the protocol itself is capable of rollback (meaning that sometime after block A is appending to the BlockChain, subsequent events can replace A with some other block B, invalidating transactions contained in A), there is never absolute certainty in the actual final state of the BlockChain.  In fact, although rollbacks of more than 3 blocks at the tail of the BlockChain are very rare, an omniscient adversary could construct a situation that would roll the entire BitCoin BlockChain back to the first block[1]. 
  • The protocol is designed for anonymous participants and is intended to tolerate some level of Byzantine mischief, but the precise conditions for progress are hard to pin down, in part because arbitrary rollback is a part of the model.
The protocol has an implied assumption that the network is fast enough to fully replicate updates (via gossip broadcast) within a short amount of time, which would normally be seconds when computers are active.  Of course when a computer reconnects after being shut down for a while, it takes longer because it will need to catch up.  This assumption isn’t really stated either.
The conditions under which a particular BitCoin transaction block has become fully stable (would never roll back) are somewhat fuzzy, but because a rollback-free BlockChain prefix is strong enough to achieve consensus, cannot be weaker than the bounds for progress in a consensus (uniform agreement) system operating with a partially synchronous network.  This question was studied formally and results by Lynch, Keidar and others would seemingly apply.
In fact it may be possible to show that in a rollback-free Blockchain prefix, created by the Bitcoin protocol, there is a sense in which the protocol runs as a series of epochs, with the (anonymous) members of the present epoch effectively voting in the (anonymous) members that will comprise the next epoch.  I’ve wanted to look at the problem carefully, but haven’t had the required time (anyhow, the proof, if I am correct, might really tax my formal skills).  If you can prove this, I’m happy for you to take credit, but if you didn’t formulate the problem prior to reading this blog entry, I would appreciate credit for “proposing the problem”!)
The solution makes heavy use of anonymity.  The participating endpoint computers that hold Bitcoins and mine for new Blockchain extensions are all named by cryptographic keys, and can create new names for themselves as often as they like.  The proof-of-work aspect of the Blockchain protocol prevents what are called Sybil attacks, in which some computer hijacks a system by pretending to be an enormous number of computers operating in concert.  Without anonymity, and fear of Byzantine behavior, BitCoin’s, proof of work would not be needed.
The power of the BlockChain
The modern view is that the concept of a BlockChain be taken as a separate entity: in this modern perspective, BitCoin needs a BlockChain with certain additional special properties, and implements one using its own probabilistic protocol.
Viewed in this more modern way, a BlockChain is simply:
  • An append-only log of records.
  • The records are ordered, which is evident from the first requirement, but also include a cryptographic signature that somehow witnesses the prior blocks.  Thus the integrity of the BlockChain can easily be checked by scanning it front to back or back to front, confirming that each record correctly countersigns the prior ones.
  • The content of each of the blocks in the BlockChain is similarly protected: each block has a cryptographic signature spanning the information it holds.  Various schemes can be used: a hash over the records, a Merkle tree of signatures, etc.  But the upshot is that a valid BlockChain has a form of cryptographic proof of integrity built into it.  An intruder who seeks to tamper with the chain would need to rewrite the entire suffix starting with the modified record, and the deployment would often make this prohibitively difficult (for example, in BitCoin, the whole BlockChain is fully replicated to all participants).
Described in this way, a BlockChain could be implemented in many ways -- BitCoin implements it as a protocol between what it portrays as anonymous, Byzantine participants, but actually you could store your BlockChain just as easily on a standard cloud computing framework like Azure, and it could store all sorts of data -- not just BitCoin transactions.  In fact, and I'll expand on this below, you might be wise to do this (to use a more standard way of storing your BlockChain).  I say this because many of the purported properties of the BitCoin protocol simply do not hold for the protocol BitCoin implements.  The protocol is just wrong.  In fact I'm not sure it even deserves to be called "wrong" because to be wrong, it would need a clear specification, so that I could say "in such and such a situation, it violates its specification".  But BitCoin doesn't even have a real specification for its BlockChain: this is a case of a solution without a problem statement.  So how can it be wrong?  It isn't even wrong: wrong would be better!
Let's try and break things down and tackle them step by step.
First, just for clarity, what I am pointing out is this: obviously, the actual records within a BlockChain could record digital currency transactions, which is the only use made of them by BitCoin, but you can also create infrastructure to store far more elaborate information into the chain, and could then store them into any kind of database that won't lose them or permit tampering.  There are a number of new commercial products that focus on this idea: they define higher level languages for encoding digital contracts and then store them into some form of highly reliable storage system.
Where BlockChain systems depart from this is that they implement the BlockChain as a highly replicated structure: every single BitCoin participant ends up with a full copy and sees every update to it (which are always in the forms of appends: new records that should extend the length of the chain).  So we have the abstraction of a record recording transactions, and then we have the abstraction of an append-only log with cryptographic tamperproofing, and finally we have a way to implement such a log, over what turns out to be a gossip protocol.  These are protocols in which system participants share full information with one-another: A contacts B, and then A sends B whatever A knows that B lacks, and vice versa.  Gossip can be very robust, and BitCoin benefits from that.  A further advantage is that gossip can operate without tracking the full system membership -- transitive coverage of the full set is sufficient.  However, lets return to this below, because it isn't as simple as it sounds.
The digital contract languages can be quite elaborate: a digital contract could refer to variables defined by prior records (for example, in BitCoin, each coin actually has a fractional value defined by the transaction that created it), or even in a future record (“Edward agrees to sell Ken 1 to 5 sheep at a price of 75 euros per animal, contract to be consummated by December 15 of 2016, in default of which Ken would pay Edward a 15 euro cancellation fee.”).  So here we see references to future payments, conditional outcomes, etc, all of which would be evaluated as a function of the state of the BlockChain, and could evolve as the chain grows longer.
Portions of such a contract could be concealed by further layers of cryptography.  For example, a digital BlockChain service could log a record on behalf of Edward and Ken without knowing its contents.  Later, either party could demonstrate to an impartial judge that the record was logged and then (if the two parties share the decryption key) could unseal the hidden content, revealing to the judge that the contract included such-and-such terms.
This concept, however, touches upon a problem of multiparty commit: extending the BlockChain with such a record in a manner that neither party can later repudiate requires a protocol enabling us to prove (1) that both parties desired to commit this particular record, (2) the record itself was not tampered with, (3) misbehavior by the parties cannot somehow cause the entire system to fail, or render a record inaccessible relative to the original access guarantees.  Such problems can be solved, but they need to be carefully specified and the solutions proved correct.
Notice that absolutely nothing in the above requires that a BlockChain be anonymous.  In fact, a BlockChain can be operated on well known servers by a company, perhaps a bank, that is completely open about the identifies of every party.  BlockChain is a concept orthogonal to anonymity.
In fact, many banks are becoming interested in serving in this role: offering BlockChain services to their clients, for a fee, just as banks offer safe-deposit boxes.  And with cryptographic sealing, a BlockChain record can be a kind of digital safe-deposit box, holding something on behalf of the customer that the bank itself has no way to “see”, because it holds encrypted data and lacks the encryption keys, nor would it have any way to guess them or require the customer to produce them.
This said, the same community that created BitCoin has been extremely interested in a kind of anonymous federation in which the user could define his/her own notion of trust (for example, I trust Citizen’s Bank of Ithaca, and it trusts the Alternatives Federal Credit Union and the Cornell Federal Credit Union, and those credit unions trust the National Association of Credit Unions…) and then to define transactions over these trust sets.  The problem quickly becomes very interesting (when a professor uses the term “interesting” that normally means “a topic needing a great deal of research”).
I’m slightly skeptical (when a professor uses the term “slightly skeptical” he or she means “don’t agree with, at all”) that banks would ever engage in anonymous transactions, especially where financial contracts are involved.  So my belief is that the world of anonymous BlockChain transactions will be a non-banking world: some form of global barter community that might use cryptocurrencies like BitCoin or Ethereum and transact through next-generation anonymous BlockChain protocols in which the participants are fully self-defined and autonomous.  But meanwhile, the banking community might begin to offer digital safe-deposit boxes, implementing them in a completely distinct manner.
Banking with BlockChains
What then might a future bank wish to do with BlockChains?
Hold digital contracts on behalf of the bank itself and its customers (which to the bank, would not be anonymous entities, because they would pay for the service).
Deploy the solution in a highly fault-tolerant and secured manner, protected against tampering.
A banking BlockChain should have zero probability of rollback, so these protocols will need to be more like Paxos: protocols that guarantee agreement on ordering and on durability, with stability (in a formal sense, a logical property is stable if it once it holds, it holds forever).
Guarantee that the solution is compliant with the relevant financial records custody requirements.  These rules can be quite complex: transactions subject to audit or required for tax compliance may need to be held for N years but then provably destroyed once N years have elapsed, and banks may be required to track and disclose certain kinds of transactions to the relevant authorities.  The law probably will need some time to catch up with the technology in this rapidly evolving area, but it seems clear that it will be a vibrant are of future growth for the industry.
There are fascinating questions that arise when a bank has multiple BlockChains in its multiple branches, or when it transacts with other banks.  For example, suppose that the BlockChain for branch A of a bank records the transaction mentioned above (“Edward agrees to sell Ken…”) but the payment by Ken to Edward is recorded into a different BlockChain.  This seems to create a fault-tolerance threat: if the second chain was a different bank branch, could an earthquake or a flood somehow render that chain inaccessible and void the transaction, or at least make it impossible to validate?  What about cross-bank events?  What happens if a bank later fails?
It seems to me that there are some very interesting theory questions here, and it would be fun to try and pose them and develop a rigorous theory of BlockChains for banking.
The existing BitCoin community might not be enthusiastic about such work, because of their long history of working with BitCoin and its strong assumptions about anonymity, and its use of protocols that can roll back.  So financial cryptography may simply need to follow its own path.
My own take on the problems stated earlier are that they suggest a need for cross-BlockChain protocols that provably witness information, so that BlockChain A can learn information from BlockChain B in a manner that A can safely record into its own chain, with no risk of later repudiation.  This would let Ken’s payment to Edward be logged by BlockChain B in the Trumansburg branch of Tompkins County Trust, but then would let Edward query the Citizen’s Bank of Ithaca to learn that yes, he has been paid and should hand over the sheep: BlockChain A would be the one operated by the Citizen’s Bank, and it would run a protocol by which it learned of the payment from chain B in a safe and secure way.  With a bit of work to flesh out the details (for example, does B proactively report the payment to A, or should it wait for A to inquire?), this can certainly be made to work.
There will always be an element of trust.  For example, how can we be confident that Ken really paid the bill at bank B?  How can we be confident that Edward actually handed over the sheep?  The interface of real-world events to computational events and data records clearly needs attention.
I do not believe in full replication among banks.  So while in this example, A learns something from B, in general, A’s records would live entirely within A.  We do need to ask what the rules would be for performing operations that require replication, or that require cross-bank protocols.  But in general, each BlockChain should be understood as an autonomous system holding private data, and interacting with other systems only under the overarching control of those rules.
Unlike BlockChains with anonymous Byzantine participants, where proof of work is also a protection against a denial of service attack in the form of a flood of transactions that overloads the system, financial BlockChain systems wouldn’t really need any form of proof of work, because they are operated by trusted servers running trusted code (at least, code justifying the same level of trust as we accord to the operating system, the database system, and the various banking application programs).  We might still use Elliptic Curve cryptographic systems to make our BlockChains tamperproof, but the entire “social infrastructure” BitCoin and much of the BlockChain world seeks to build is rendered unnecessary in a banking setting.
Indeed, there is absolutely no reason that banking BlockChains would need to run in slow motion.  They could potentially log any rate of transactions desired.
Ken’s take on all this stuff
Clearly there is a lot one could do in this space; if I wasn’t busy with Derecho, I might move into it.  But I’m far more drawn to the banking style of BlockChain than to the anonymous Byzantine style that prevails in the field today.  My reasons are simple: I honestly think that BitCoin and its cousins are ill-specified and in some ways, provably broken.  How can one ever trust a currency if the records can potentially be invalidated years from now simply because Virgin Space starts a tourism service to Mars?  To me the answer is obvious: we can’t.  Not “it really never happens, don’t worry about it” but “no.”  And once one rejects anonymity and Byzantine behavior and so forth – rejects, in some sense, the political agenda that the Satoshi Nakamoto manifesto set forth at the outset, we’re left with a fairly standard, recognizable form of distributed computing service, with replication for fault-tolerance and high availability, and with strongly consistent cross-site protocols.  This class of questions is solidly in my area of interest.


[1] To trigger this, you need a partitioned situation in which a subgroup of BitCoin miners operates in total isolation.  For example, perhaps you bring BitCoin mining software with you on your one-way trip to Mars and plan to mine for coins to while away the rest of your life there.  A communications breakdown cuts you off from Earth, and blocks you from reporting your discovery of alien supercomputers that use a mysterious technology to solve elliptic curve cryptography problems.  Using this technology, you create a blockchain far longer than the one on Earth.  Now, miraculously, the new Virgin Mars Shuttle shows up to rescue you and drop off the first of the Mars tourists, and you are able to merge your BlockChain with the one on Earth.  But yours is twice as long, so the entire Earth BlockChain rolls back, invalidating all the transactions that occurred during your years of isolation.  (Valid coins that were created before your departure and then spent in the now-rolled-back transactions revert to their earlier owners, who get to spend them again, but the bad news is that coins minted during your absence become invalid, as do coins received through transactions in the rolled-back portion of the BlockChain).

No comments:

Post a Comment