BitCoin has spawned a whole series of information
revolutions: some concerned with digital currency, some more focused around
anonymity, and some concerned with the potential for cryptographic block
chains. I thought I might say a few
words on the topic. But I should maybe
remark first that I’m not a world expert on this topic, although I’ve read a
great deal and talked with people who are doing cutting edge research. So take these as a few thoughts from someone
on the periphery.
I didn't give much thought to BitCoin for a long time: obviously, I was aware of it, and read about arrests of the folks running Silk Road, which was a dark
web site (meaning you accessed it over cryptographically tunneled links)
hosting a kind of Amazon.com of illicit substances, prostitution, and
apparently even murder for hire. I won’t
summarize the story, but I definitely recommend reading about it. Anyhow, Silk Road transactions were all
denominated in BitCoin, because the currency was reasonably widely available,
and offers anonymity.
But I always viewed BitCoin as an interesting application built using a technically uninteresting protocol: we've worked with replicated append-only persistent logging for ages (after all, this is what Paxos is about), and Byzantine versions too (consider the work that Castro and Liskov did at MIT on practical Byzantine replication). Thus the fact that BitCoin had such a protocol within it was kind of unimportant to me.
But sometimes, a technology takes hold and you are sort of forced to begin to think about it. In my crowd, people talk about BitCoin quite a lot these days, and are getting interested in BlockChain technology. So I keep finding myself pulled into discussions about the technology. The topic I think is worth discussing here
isn’t specific to the way BitCoin works, but relates to its BlockChain.
We can understand Bitcoin in terms of
distinct aspects:
BitCoin is a digital currency (or commodity) and people buy and sell them, and use them in transactions. So one dimension is concerned with BitCoin in this role as currency.
BitCoin centers on a style of agreement protocol that builds an append-only log of transactions: the BlockChain. The specific BitCoin log has the difficulty that because the protocol itself is capable of rollback (meaning that sometime after block A is appending to the BlockChain, subsequent events can replace A with some other block B, invalidating transactions contained in A), there is never absolute certainty in the actual final state of the BlockChain. In fact, although rollbacks of more than 3 blocks at the tail of the BlockChain are very rare, an omniscient adversary could construct a situation that would roll the entire BitCoin BlockChain back to the first block.
The protocol is designed for anonymous participants and is intended to tolerate some level of Byzantine mischief, but the precise conditions for progress are hard to pin down, in part because arbitrary rollback is a part of the model.
The protocol has an implied assumption that
the network is fast enough to fully replicate updates (via gossip broadcast)
within a short amount of time, which would normally be seconds when computers
are active. Of course when a computer
reconnects after being shut down for a while, it takes longer because it will
need to catch up. This assumption isn’t
really stated either.
The conditions under which a particular
BitCoin transaction block has become fully stable (would never roll back) are
somewhat fuzzy, but because a rollback-free BlockChain prefix is strong enough
to achieve consensus, cannot be weaker than the bounds for progress in a
consensus (uniform agreement) system operating with a partially synchronous
network. This question was studied
formally and results by Lynch, Keidar and others would seemingly apply.
In fact it may be possible to show that in
a rollback-free Blockchain prefix, created by the Bitcoin protocol, there is a
sense in which the protocol runs as a series of epochs, with the (anonymous)
members of the present epoch effectively voting in the (anonymous) members that
will comprise the next epoch. I’ve
wanted to look at the problem carefully, but haven’t had the required time (anyhow,
the proof, if I am correct, might really tax my formal skills). If you can prove this, I’m happy for you to
take credit, but if you didn’t formulate the problem prior to reading this blog
entry, I would appreciate credit for “proposing the problem”!)
The solution makes heavy use of anonymity. The participating endpoint computers that
hold Bitcoins and mine for new Blockchain extensions are all named by
cryptographic keys, and can create new names for themselves as often as they
like. The proof-of-work aspect of the
Blockchain protocol prevents what are called Sybil attacks, in which some
computer hijacks a system by pretending to be an enormous number of computers
operating in concert. Without
anonymity, and fear of Byzantine behavior, BitCoin’s, proof of work would not
be needed.
The
power of the BlockChain
The modern view is that the concept of a
BlockChain be taken as a separate entity: in this modern perspective, BitCoin
needs a BlockChain with certain additional special properties, and implements
one using its own probabilistic protocol.
Viewed in this more modern way, a
BlockChain is simply:
An append-only log of records.
The records are ordered, which is evident from the first requirement, but also include a cryptographic signature that somehow witnesses the prior blocks. Thus the integrity of the BlockChain can easily be checked by scanning it front to back or back to front, confirming that each record correctly countersigns the prior ones.
The content of each of the blocks in the BlockChain is similarly protected: each block has a cryptographic signature spanning the information it holds. Various schemes can be used: a hash over the records, a Merkle tree of signatures, etc. But the upshot is that a valid BlockChain has a form of cryptographic proof of integrity built into it. An intruder who seeks to tamper with the chain would need to rewrite the entire suffix starting with the modified record, and the deployment would often make this prohibitively difficult (for example, in BitCoin, the whole BlockChain is fully replicated to all participants).
Described in this way, a BlockChain could be implemented in many ways -- BitCoin implements it as a protocol between what it portrays as anonymous, Byzantine participants, but actually you could store your BlockChain just as easily on a standard cloud computing framework like Azure, and it could store all sorts of data -- not just BitCoin transactions. In fact, and I'll expand on this below, you might be wise to do this (to use a more standard way of storing your BlockChain). I say this because many of the purported properties of the BitCoin protocol simply do not hold for the protocol BitCoin implements. The protocol is just wrong. In fact I'm not sure it even deserves to be called "wrong" because to be wrong, it would need a clear specification, so that I could say "in such and such a situation, it violates its specification". But BitCoin doesn't even have a real specification for its BlockChain: this is a case of a solution without a problem statement. So how can it be wrong? It isn't even wrong: wrong would be better!
Let's try and break things down and tackle them step by step.
First, just for clarity, what I am pointing out is this: obviously, the actual records within a BlockChain
could record digital currency transactions, which is the only use made of them
by BitCoin, but you can also create infrastructure to store far more elaborate
information into the chain, and could then store them into any kind of database that won't lose them or permit tampering. There are a number
of new commercial products that focus on this idea: they define higher level
languages for encoding digital contracts and then store them into some form of
highly reliable storage system.
Where BlockChain systems depart from this is that they implement the BlockChain as a highly replicated structure: every single BitCoin participant ends up with a full copy and sees every update to it (which are always in the forms of appends: new records that should extend the length of the chain). So we have the abstraction of a record recording transactions, and then we have the abstraction of an append-only log with cryptographic tamperproofing, and finally we have a way to implement such a log, over what turns out to be a gossip protocol. These are protocols in which system participants share full information with one-another: A contacts B, and then A sends B whatever A knows that B lacks, and vice versa. Gossip can be very robust, and BitCoin benefits from that. A further advantage is that gossip can operate without tracking the full system membership -- transitive coverage of the full set is sufficient. However, lets return to this below, because it isn't as simple as it sounds.
The digital contract languages can be quite
elaborate: a digital contract could refer to variables defined by prior records
(for example, in BitCoin, each coin actually has a fractional value defined by
the transaction that created it), or even in a future record (“Edward agrees to
sell Ken 1 to 5 sheep at a price of 75 euros per animal, contract to be
consummated by December 15 of 2016, in default of which Ken would pay Edward a
15 euro cancellation fee.”). So here we
see references to future payments, conditional outcomes, etc, all of which
would be evaluated as a function of the state of the BlockChain, and could
evolve as the chain grows longer.
Portions of such a contract could be
concealed by further layers of cryptography.
For example, a digital BlockChain service could log a record on behalf
of Edward and Ken without knowing its contents.
Later, either party could demonstrate to an impartial judge that the
record was logged and then (if the two parties share the decryption key) could
unseal the hidden content, revealing to the judge that the contract included
such-and-such terms.
This concept, however, touches upon a
problem of multiparty commit: extending the BlockChain with such a record in a
manner that neither party can later repudiate requires a protocol enabling us
to prove (1) that both parties desired to commit this particular record, (2)
the record itself was not tampered with, (3) misbehavior by the parties cannot
somehow cause the entire system to fail, or render a record inaccessible
relative to the original access guarantees.
Such problems can be solved, but they need to be carefully specified and
the solutions proved correct.
Notice that absolutely nothing in the above
requires that a BlockChain be anonymous.
In fact, a BlockChain can be operated on well known servers by a
company, perhaps a bank, that is completely open about the identifies of every
party. BlockChain is a concept
orthogonal to anonymity.
In fact, many banks are becoming interested
in serving in this role: offering BlockChain services to their clients, for a
fee, just as banks offer safe-deposit boxes.
And with cryptographic sealing, a BlockChain record can be a kind of
digital safe-deposit box, holding something on behalf of the customer that the
bank itself has no way to “see”, because it holds encrypted data and lacks the
encryption keys, nor would it have any way to guess them or require the
customer to produce them.
This said, the same community that created
BitCoin has been extremely interested in a kind of anonymous federation in
which the user could define his/her own notion of trust (for example, I trust
Citizen’s Bank of Ithaca, and it trusts the Alternatives Federal Credit Union
and the Cornell Federal Credit Union, and those credit unions trust the
National Association of Credit Unions…) and then to define transactions over
these trust sets. The problem quickly
becomes very interesting (when a professor uses the term “interesting” that
normally means “a topic needing a great deal of research”).
I’m slightly skeptical (when a professor
uses the term “slightly skeptical” he or she means “don’t agree with, at all”)
that banks would ever engage in anonymous transactions, especially where
financial contracts are involved. So my
belief is that the world of anonymous BlockChain transactions will be a
non-banking world: some form of global barter community that might use
cryptocurrencies like BitCoin or Ethereum and transact through next-generation
anonymous BlockChain protocols in which the participants are fully self-defined
and autonomous. But meanwhile, the banking
community might begin to offer digital safe-deposit boxes, implementing them in
a completely distinct manner.
Banking
with BlockChains
What then might a future bank wish to do
with BlockChains?
Hold digital contracts on behalf of the
bank itself and its customers (which to the bank, would not be anonymous
entities, because they would pay for the service).
Deploy the solution in a highly
fault-tolerant and secured manner, protected against tampering.
A banking BlockChain should have zero
probability of rollback, so these protocols will need to be more like Paxos:
protocols that guarantee agreement on
ordering and on durability, with stability (in a formal sense, a logical
property is stable if it once it
holds, it holds forever).
Guarantee that the solution is compliant
with the relevant financial records custody requirements. These rules can be quite complex:
transactions subject to audit or required for tax compliance may need to be
held for N years but then provably destroyed once N years have elapsed, and
banks may be required to track and disclose certain kinds of transactions to
the relevant authorities. The law
probably will need some time to catch up with the technology in this rapidly
evolving area, but it seems clear that it will be a vibrant are of future
growth for the industry.
There are fascinating questions that arise
when a bank has multiple BlockChains in its multiple branches, or when it
transacts with other banks. For example,
suppose that the BlockChain for branch A of a bank records the transaction
mentioned above (“Edward agrees to sell Ken…”) but the payment by Ken to Edward
is recorded into a different BlockChain.
This seems to create a fault-tolerance threat: if the second chain was a
different bank branch, could an earthquake or a flood somehow render that chain
inaccessible and void the transaction, or at least make it impossible to
validate? What about cross-bank
events? What happens if a bank later fails?
It seems to me that there are some very
interesting theory questions here, and it would be fun to try and pose them and
develop a rigorous theory of BlockChains for banking.
The existing BitCoin community might not be
enthusiastic about such work, because of their long history of working with
BitCoin and its strong assumptions about anonymity, and its use of protocols
that can roll back. So financial
cryptography may simply need to follow its own path.
My own take on the problems stated earlier
are that they suggest a need for cross-BlockChain protocols that provably
witness information, so that BlockChain A can learn information from BlockChain
B in a manner that A can safely record into its own chain, with no risk of
later repudiation. This would let Ken’s
payment to Edward be logged by BlockChain B in the Trumansburg branch of
Tompkins County Trust, but then would let Edward query the Citizen’s Bank of
Ithaca to learn that yes, he has been paid and should hand over the sheep:
BlockChain A would be the one operated by the Citizen’s Bank, and it would run
a protocol by which it learned of the payment from chain B in a safe and secure
way. With a bit of work to flesh out the
details (for example, does B proactively report the payment to A, or should it
wait for A to inquire?), this can certainly be made to work.
There will always be an element of
trust. For example, how can we be
confident that Ken really paid the bill at bank B? How can we be confident that Edward actually
handed over the sheep? The interface of
real-world events to computational events and data records clearly needs
attention.
I do not believe in full replication among
banks. So while in this example, A
learns something from B, in general, A’s records would live entirely within
A. We do need to ask what the rules
would be for performing operations that require
replication, or that require cross-bank protocols. But in general, each BlockChain should be
understood as an autonomous system holding private data, and interacting with
other systems only under the overarching control of those rules.
Unlike BlockChains with anonymous Byzantine
participants, where proof of work is also a protection against a denial of
service attack in the form of a flood of transactions that overloads the
system, financial BlockChain systems wouldn’t really need any form of proof of
work, because they are operated by trusted servers running trusted code (at
least, code justifying the same level of trust as we accord to the operating
system, the database system, and the various banking application programs). We might still use Elliptic Curve
cryptographic systems to make our BlockChains tamperproof, but the entire
“social infrastructure” BitCoin and much of the BlockChain world seeks to build
is rendered unnecessary in a banking setting.
Indeed, there is absolutely no reason that
banking BlockChains would need to run in slow motion. They could potentially log any rate of
transactions desired.
Ken’s
take on all this stuff
Clearly there is a lot one could do in this
space; if I wasn’t busy with Derecho, I might move into it. But I’m far more drawn to the banking style
of BlockChain than to the anonymous Byzantine style that prevails in the field
today. My reasons are simple: I honestly
think that BitCoin and its cousins are ill-specified and in some ways, provably
broken. How can one ever trust a currency if the records can potentially be invalidated
years from now simply because Virgin Space starts a tourism service to
Mars? To me the answer is obvious: we
can’t. Not “it really never happens,
don’t worry about it” but “no.” And once
one rejects anonymity and Byzantine behavior and so forth – rejects, in some
sense, the political agenda that the Satoshi Nakamoto manifesto set forth at
the outset, we’re left with a fairly standard, recognizable form of distributed
computing service, with replication for fault-tolerance and high availability,
and with strongly consistent cross-site protocols. This class of questions is solidly in my area
of interest.