A Few Thoughts on Distributed Computing: July 2018

Last fall, Scott Aaronson visited Cornell and gave a series of talks on quantum computing. I asked him about quantum-encrypted fiber-optic communication: how can users be sure that the technology actually uses entanglement, and isn't just some form of standard communication link set up to mimic the API to a quantum one?

Background: Quantum cryptographic methods basically mix classic communication with a quantum source of completely secure "noise". They use a normal PKI for endpoint authentication, but introduce a quantum device that sends entangled photos to the two endpoints. By measuring some unknown property (usually, polarization), the endpoints extract a genuinely random sequence of 0 and 1 bits that both observe identically (entanglement), Any intruder who attempts to spy on the system would disrupt the entanglement. The shared sequence of random bits can then be used as the basis for end-to-end symmetric cryptography.

A vendor offering a product in this space needs to do much more than to provide an optical fiber that carries entangled photons. One issue is that the endpoints need to synchronize to ensure that they perform the same test on the same photons. This isn't easy at high data rates. To work around the limit, you would probably use quantum entanglement to create a symmetric key shared by the endpoints, then employ that symmetric key as input to DES or some other high-speed symmetric cryptographic solution.

But suppose we don't trust the vendor. Could the hardware be designed to "cheat"?

Scott's answer: Thus there are many ways to cheat. For example, notice that the scheme outlined above starts with a completely unknown property: entangled photos with totally random polarization. One could instead generate an entangled sequence with known polarization.

The user will have been fooled into using a key that the evil-doer generated (and hence, knows). The user's secrets will actually be out in the open, for anyone (at least, anyone who knows the sequence) to read.

In fact, why even bother to entangle the photons? Why not just take a laser, polarize the output (again in a known way), and then beam the resulting (non-random, non-entangled) output through a half-silvered mirror, so that both endpoints can see the same signal. A naïve user would measure polarizations, extract the same sequence from each end, and think that the device was working flawlessly.

Beyond this, one can imagine endpoint hardware that genuinely goes to all the trouble of extracting random data from quantum-entangled photons, but then ignores the random input and substitutes some form of pre-computed key that the evil-doer computed months ago, and stored in a table for later use. Here, the buyer can actually go to the trouble of verifying the entanglement, confirm that the device is genuinely intrusion-tolerance, and so forth. Yet we would have zero security, because the broken endpoint logic ignores the quantum-random input.

Bell's Theorem. Setting such cases to the side, Scott also pointed out that for the entangled photons on the fiber-optic cable, there actually is a good way to test that the device is working. He explained that in the lab, physicists test such technologies by running a "Bell's Inequality" experiment.

As you may know, Bell's Theorem was proposed by John Stewart Bell as a way to test one of the theories about quantum entanglement -- some believed at the time that "hidden variables" were at the root of entanglement, and were actively exploring ways that such variables could arise. Bell showed that true entanglement could be distinguished from a hidden variable system using a series of measurements that would yield different results for the two cases. Scott's point was that could run a Bell's inequality experiment on the fiber. It would give unambiguous evidence that the photons emerging from our fiber are genuinely entangled.

But a Bell's test would only cover the technology to the endpoints of the fiber carrying the entangled photons, and we could only run such a test with "unrestricted" access to the medium. Very few products could possibly be deconstructed in this way.

Bottom line? Clearly, it is vitally important that quantum encrypted communications technology be from a full-trusted vendor. A compromised vendor could sell an undetectably flawed technology.

Why is this relevant to BlockChains? A BlockChain technology is only as secure as the cryptographic package used in its block-entanglement step. Suppose, for example, that I created a cryptographic package called SHA 256, but instead of using the actual SHA 256 algorithm, implemented it in some trivial and insecure way. As long as that package produces an random-looking hash of the input, of the proper length, one that varies for different inputs, you might be fooled.

What's the risk? Well, if I could trick you into using my product, it would look like a BlockChain and seem secure. Yet suppose that the chain included block X that has a transaction I find "awkward". If my fake hashing system lacks a cryptographic property called "strong collision resistance", I could substitute block Y for X, modifying the stable body of the chain, and you wouldn't be able to prove that this tampering had occurred. Obviously this defeats the whole point.

Now, if you were to check the output against a different, more trusted SHA 256 hash solution the values would differ. Yet how many people audit their BlockChain products using a technology totally independent of anything the BlockChain vendor provided? In this example, even using the SHA 256 code provided by your vendor is a mistake: the SHA 256 code is broken.

Moreover, there are other ways that one could potentially trick a user. A SHA 256 hash computed on just a portion of the transaction record could look completely valid (would be valid, for that portion of the block), and yet would leave room for tampering with any bytes not covered by the hash. Your audit would thus need to really understand the BlockChain data structure, which isn't as simple as you might expect. Many BlockChain vendors use fairly complex data structures, and it isn't totally trivial to extract just the chain of blocks and hashes to audit that the hash actually covers the data in the block. Any vendor-supplied code you use for this step, at all, would expose you to a risk that when you go to audit the chain, the vendor tool covers up any tampering.

My point? This is a genuine risk. An immense number of companies are jumping to use BlockChain for diverse mission-critical purposes. These companies are relying on the guarantee that once the blocks in the chain become stable, nobody could tamper with the record. Yet what we see here is that a BlockChain is only as good as the vendor and the cryptographic package, and that the chain can only be trusted if you have some way to independently test its integrity. And you had better really run that test, often.

My advice to anyone working with BlockChain is to hire a trusted independent consultant to build a BlockChain test program that would audit the actual chain daily, using completely independent technology. If the vendor is using AES 256 for hashing, your auditing company should find a very trustworthy AES 256 and base the audit on that. If the chain uses some other hashing method, same goes for it -- this can work for any standards.

What if your vendor is offering a non-standard BlockChain that runs super-fast by virtue of using a new but proprietary hashing technology, or a new but non-standard secret data structure? My advice is simple: if the vendor won't supply you with enough detail to let you audit the chain, don't trust it.

A Few Thoughts on Distributed Computing

Wednesday, 11 July 2018

Why we need a "Bell's test" for BlockChains