A Few Thoughts on Distributed Computing: August 2018

Wednesday, 22 August 2018

Inventing the intelligent, active, edge

At the recent Microsoft faculty research summit, I was thrilled (and also somewhat relieved) to see that the company decided to run a whole workshop emphasizing systems (including networking), and the ways that systems can support machine learning. It seemed clear that Microsoft has decided to become a major player in what they call "intelligent edge" computing and is urging all of us to jump on board.

These terms may be new, so I thought I might summarize the trend, based on my current understanding of it. A good place to start is with a little mini-tutorial on what cloud computing infrastructures have been doing up to now, because the intelligent active edge really builds on the current architecture (over time, it will be more and more differentiated, but today, the overlap is substantial).

So: Today, we have a cloud dominated by a style of computing that prevailed in the 2000-2010 web server and services period. In a first draft of this essay I wrote a page or so about this topic but it got heavy on jargon and I felt that it was taking too long to get to the point. So I'll get there very quickly:

Many people think of the cloud as a way to rent Linux containers, but the bigger and more exciting trend focuses on elastic platforms that are event driven, connected by various ways to pass objects from layer to layer, and customizable by providing little event handlers: "functions".
Examples of platforms like this include Amazon Lambda, Microsoft Azure Functions, Google Tensor Flow, Spark/DataBricks RDDs.
The connections tend to be via some form of queuing service (Amazon SQS, Apache Kafka, Azure Service Queues, IBM's MQSeries, OpenSplice, etc). Big objects are often just stored into a large file system (S3, Google GFS, Hadoop HDFS, etc).
Everything is sharded from start to finish. Data shows up on HTTP connections (web requests to web services), but programmable edge routers (like Amazon Route 53) extract keys and use standard distributed hashing schemes to vector the requests into "shards", within which they may additionally load-balance.
We cache everything in sight, using DHTs like Amazon Dynamo, Cassandra, Microsoft FaRM.
The long-term storage layers are increasingly smart, like Azure Cosmos-DB. They may do things like deduplication, compression, image segmentation and tagging, creation of secondary objects, etc. Often they are backed by massive long-term storage layers like Azure Pelican.
Then of course we also have standard ways to talk to databases, pre-computed files, other kinds of servers and services, back-end systems that can run MapReduce (Hadoop) or do big-data tasks, etc.
The heavy lifting is hardware accelerated using GPU, TPU, FPGA and similar technologies, and as much as possible, we move data using RDMA and store it into memory-mapped non-volatile memory units (SSD or the newer 3D-XPoint NVMs like Optane).

Whew! I hope you are still with me...

The nice thing about this complex but rather "standard" structure is that the developer simply writes a few event handlers for new web requests and most of the rest is automated by the AWS Lambda, Google Tensor Flow or Azure Functions environment. Learning to work in this model is a bit of a challenge because there is a near total lack of textbooks (my friend Kishore Kumar is thinking of writing one), and because the technologies are still evolving at an insane pace.

The big players have done what they can to make these things a little easier to use. One common approach is to publish a whole suite of case-study "demos" with nice little pictures showing the approach, like you would find here for Azure, or here for AWS. In the best cut-and-paste fashion, the developer just selects a design template similar to what he or she has in mind, downloads the prebuilt demo, then customizes it to solve their own special problem by replacing the little functions with new event handlers of his or her own design, coded in any language that feels right (Python is popular, but you typically get a choice of as many as 40 popular options including JavaScript), and that will run in a little containerized VM with very fast startup -- often 1ms or less to launch for a new request.

This is the opposite of what we teach in our undergraduate classes, but for the modern cloud is probably the only feasible way to master the enormous complexity of the infrastructures.

So... with this out of the way, what's the excitement about the intelligent edge (aka active edge, reactive edge, IoT edge...)?

The key insight to start with is that the standard cloud isn't a great fit for the emerging world of live machine-learning solutions like support for self-driving cars, smart homes and power grids and farms, you name it. First, if you own a huge number of IoT devices, it can be an enormous headache to register them and set them up (provisioning), securely monitor them, capture data privately (and repel attacks, which can happen at many layers). Next, there is an intense real-time puzzle here: to control self-driving cars or drones or power grids, we need millisecond reaction times plus accurate, consistent data. The existing cloud is more focused on end-to-end web page stuff where consistency can be weak and hence the fast reactions can use stale data. So CAP is out the window here. Next, we run into issues of how to program all of this. And if you solve all of this in the cloud, you run into the question of what to do if your farmer happens to have poor connectivity back to the cloud.

So the exciting story about Azure IoT Edge was that Microsoft seems to have tackled all of this, and has a really coherent end-to-end vision that touches on every element of the puzzle. This would be endless if I shared everything I learned, but I'll summarize a few big points:

They have a clean security and provisioning solution, and a concept of IoT life cycle with monitoring, visualization of "broken stuff", ways to push updates, etc.
They work with ISPs to arrange for proper connectivity and adequate bandwidth, so that applications can safely assume that the first-hop networking won't be a huge barrier to success.
They have a concept of an Azure IoT "hub" that can run as a kind of point-of-presence. Apparently it will eventually even be able to have a disconnected mode. Many companies operate huge numbers of PoP clusters and a smaller number of cloud data centers, so here Azure IoT Edge is taking that to the next level and helping customers set those up wherever they like. You could imagine a van driving to a farm somewhere with a small rack of computers in it and running Azure IoT "hub" in the van, with a connection while the van is back at the home office, but temporarily disconnected for the couple of hours the company is working at that particular farm.
Then the outer level of Azure itself would also run the Azure IoT edge framework (same APIs) but now with proper connectivity to the full cloud. And the framework has a strong emphasis o topics dear to me like real-time, ways of offering consistency, replication for parallelism or fault-tolerance, etc. I'm looking at porting Derecho into this setting so that we can be part of this story, as a 3rd party (open source!) add-on. They have a plan to offer a "marketplace" for such solutions.

As a big believer in moving machine learning to the edge, this is the kind of enabler I've been hoping someone would build - right now, we've lacked anything even close, although people are cobbling solutions together on top of Amazon AWS Lambda (which perhaps actually is close, although to me has less of a good story around the IoT devices themselves), or Google Tensor Flow (which is more of a back-end story, but has some of the same features). As much as I love Spark/Databricks RDDs, I can't see how that could be an IoT story anytime soon.

So I plan to dive deep on this stuff, and will share what I learn in the coming year or so! Stay tuned...

Thursday, 9 August 2018

Magical Thinking and the Logical Foundations of BlockChains

During the past few years, I've been exposed to an unrelenting drumbeat for BlockChains. The level of enthusiasm for this model, and the commercial mania around it, have all the elements of a "market bubble". Just yesterday I saw a quote from a BlockChain/CyberCoin billionaire who believes that "BlockChain will replace the Internet." Really? But search for that phrase and you'll actually find that this guy is saying something many people believe. Rational or not, there is a huge community totally convinced that the future will be a giant BlockChain.

The BlockChain buzz was evident at the recent conference I attended, where one speaker told us about a Berkeley spin-off based on BlockChain: Oasis, which just landed $45M in first-round "seed" funding. Just think about that for a moment: how can such a number be justified? I'm a skeptic.

Oasis apparently plans to build a secure infrastructure using BlockChain as the storage solution, but entirely secure from the ground up. Presumably this is one reason Oasis needed so much cash: most companies these days just run on a public cloud like Azure or Amazon (or any of the others). Building from the ground up will be expensive, but would let Oasis avoid a pitfall other startups share: because its competitors depend on existing datacenters, they are exposed to whatever security flaws those cloud platforms embody.

But can one build a secure data center from the ground up? Let's focus purely on storage, since this seems to be the essence of the Oasis plan. Could one build a new kind of secure data center that (only) provides secure storage using BlockChain, built from the ground up?

The first step is to reject the permissionless BlockChain model, which is too weak to guarantee freedom from rollbacks even years after a transaction seemingly commits: permissionless BlockChain systems with anonymous servers are insecure by design. We want a minimal BlockChain solution, but if you take minimal to mean "anonymous, globally replicated, permissionless", my answer is that "it can't be done: it is impossible to create a trustworthy platform with that mix of properties and technologies."

Fortunately, the permissioned model avoids this risk. Moreover, it seems reasonable to assume that if the goal is to offer a data-center BlockChain product, the data center operator can control which machines can operate the solution. This moves us from the BitCoin style of anonymous, amorphic, fully decentralized BlockChain to a more standard model of append-only files used by customers like banks.

Next, we should perhaps reject a standard cloud that offers encrypted append-only files. This is an interesting step in the analysis because a block chain really is just a secure append-only file, no matter what anyone might tell you (secured using SHA 256 hashes or similar block hashes, with proof-of-work if the system is permissionless, and then with the signatures entangled to prevent tampering). Any file system could play that role, if you code the needed logic to generate records formatted this way and with the required chain of attestation. Amazon and Azure and other cloud companies already offer extremely secure storage layers, including BlockChain services. But as noted, they do depend on other elements of the respective datacenter systems. So out with the old, and in with the new!

Now, without knowing anything about the proprietary protocols that Oasis is presumably designing, I can't say anything about how they plan to guarantee correctness. But I can tell you how I would do it. I would use a form of Paxos, and because I would want extreme speed, would go with the version of durable Paxos that we implemented in Cornell's Derecho system. If I were chief architect at Oasis, I might want to build my own software (hence not use Derecho), but I would certainly adopt the Paxos specification, and prove that my software implements it.

Of course being a Derecho zealot, I'm going to make the case that using Derecho might be the smart move, even if you might have preferred to roll your own.

First, I should note that by using Paxos, Derecho is able to leverage decades of work on proofs of correctness -- Derecho was implemented by fusing a proved-correct version of Paxos integrated with a proved-correct version of the virtual synchrony membership management model and a new reliable (but not atomic) multicast layer. Then all of the data movement steps were mapped to the available storage (SSD or 3-D XPoint) and communications technologies (RDMA or TCP) in an efficient way. Finally, Derecho uses specialized optimizations for code paths that turn out to be performance-critical.

Next, it is worth noting that Derecho is open source, hence can benefit from community contributions over time, and also that the system is extremely fast -- the fast such system ever created. Further, it actually achieves minimal bounds for Paxos (Derecho's protocols are "optimal" in a formal sense). Of course one can talk about smaller constants and so forth, but given the speed of the system, there is an obvious case to be made that the constants aren't showing any sign of being unreasonably large. So Derecho is a genuinely interesting option! Heck, maybe my students and I should try and raise $50M or so and jump into the commercial space!

But now we run into the first of a series of issues: the protocols and proofs we are leveraging were all created in the usual way: on paper, by hand. The mapping didn't modify the underlying logic, yet it required to adapt them to modern networking is by hand, too, and hence also needed to be proved by hand. The same can be said about our dozens of optimizations.

Is there really a sensible way to argue that all of these hand-written proofs be accepted as somehow being "more trustworthy" than Amazon AWS or Azure? After all, those are companies are serious about specifications too, and further, both have invested hundreds of millions of dollars in their testing and Q/A process. Derecho is remarkably robust, and we would point to all those proofs as part of the reason, but even so, we do find bugs in our logic.

Now, if I had the money, one option might be to harden Derecho. Aggressive testing, deep quality assurance and better documentation would be a breeze. Starting with such a strong prototype, we could quickly end up with a professional-quality tool. In fact I actually hope to do this, in the next year or so, if Derecho users are able to chip in to help with the costs.

But perhaps you still wouldn't be satisfied. And indeed, today's state of the art goes much further: the very best approach is to only run code extracted from a machine-checked proof. In effect, we compile the proof into the running system and take the developers entirely off the development path.

This, though, turns out to be infeasible for software as elaborate as Derecho, and would be even harder for whatever Oasis really plans to build. The issue here is that as of today, the provers can't handle the style of code that we needed to use in order to create Derecho, and any full data-center infrastructure would have 10x more such code, and perhaps far more than just 10x.

Today's best provers actually can handle automated extraction of Paxos code from correctness proofs (this has been done using NuPRL here at Cornell, or IOA and Larch in Nancy Lynch's group at MIT, or IronFleet, using Dafny at Microsoft). The resulting solutions are very robust in the sense that the logical proof structures that result are rock solid -- for what they cover. Unfortunately, however, the resulting code is incredibly slow compared to Derecho. Moreover, infrastructure aspects are much harder to formalize than data replication and consistency, so in some sense, Paxos is the "easy" part of the story. Can we specify the system, fully? Is the specification itself bullet-proof? Bootstrap? Other management tasks? These steps are much harder than many people would expect. They may not even be possible, in some cases!

This same pattern is evident in many projects that have formalized aspects of operating systems or runtime environments. At MIT, Nick Zeldovich famously proved a very simple file system correct a few years ago. It ran on Linux, but there is a Linux u-kernel called SEL4 that has been proved correct, and while it doesn't cover all of Linux, SEL4 probably has enough stuff in it to support Nick's provably correct file system.

Then you could perhaps use the proved version of the C compiler to compile the thing (C++ is considered to be too complex for existing provers). Even better would be to just build the whole thing in a language like RUST or Dafny, where proof is much more central to the coding and compilation model. With such help, you may actually manage to create a proved solution, for parts of your full system.

But without exception, you'll end up with slow code compared to the best possible, and will have solved just a portion of the entire datacenter infrastructure. We are decades from having data-center scale, provably correct , ultra-performant software systems. Perhaps we'll never get there.

Moreover, even if we did manage to overcome these barriers, we would run into further questions.

One big issue is the hardware. Think about any hardware component in the entire data center. The routers. Printers. Digital telephone systems. Storage devices.

$45M may seem like a huge amount of money, but in fact it is a tiny drop in the bucket when you consider that companies like Intel spend billions on their VLSI chip fabrication factories. So there is simply no question that Oasis will end up using components some other company created.

The problem is that these components tend to be software-defined, and this is becoming more and more of a standard story: Almost all hardware components have general-purpose, highly capable processors within them. An entire separate computer, with its own memory, network interfaces and code.

Indeed, if you needed one CPU and DRAM unit just to operate the hardware, why not include two on the same board, or chip? Who would notice? And if you do this, why not drop malicious code into the second unit? Your printer company may not even realize that it is selling compromised devices: I've heard of network chips that had entire secondary networking infrastructures built in, operated by entire stealth operating systems. You can monitor your network as closely as you like. You would never notice the stealth network, and its control logic!

So you can perhaps make the higher layers provably secure, but this issue of trust in the hardware limits the degree that the resulting system could be secure.

If you run a secure solution on compromised hardware, all bets are off. The situation is a bit better if you trust the hardware: Intel has an approach called SGX that can do a bit better, and perhaps Oasis plans to leverage it, but if so, they will face performance challenges. Sadly, SGX is quite slow.

But suppose they pull all of this off: a ground-up datacenter solution, minimal trust in the hardware, offering a BlockChain storage layer to customers. Now we run into a new puzzle: the issue arises of how to draw the boundary between the trusted storage solution and the customer's application.

The problem here has to do with composing a trusted application with a trusted storage layer through some form of less-trusted layer of intermediary logic, like the runtime associated with the programming language. Modern applications are coded in standard languages like Java, Python, C++, Ruby. They use databases from big vendors like Oracle, web servers like Apache, and so forth... and each of brings its own millions of lines of logic and its own runtime environment. Those presumably have flaws, hence an attacker could potentially compromise the application by compromising some element of the compiler or runtime, even if other aspects of the datacenter BlockChain storage solution magically were iron-clad.

And this is why I'm a skeptic: I don't see how such a picture can be secured, not today, and probably not in my lifetime.

I actually do believe in security, but I think that in modern systems, we get much more protection from diversity and multiple layers of monitoring than we do from automated techniques like verification. We should definitely do verification where we can -- for systems like Derecho, it offers a path that can slash bug rates and greatly improve confidence in the correctness of the system relative to the promises it can legitimately make. But to me we oversell the power of verification and proofs if we go further and allow people to believe that we have discovered a magic way to carry this idea to the limits and "prove the whole thing", whatever that thing may be. BlockChains don't change this reality.

The Oasis folks will presumably read this blog, and I should emphasis that it isn't a personal criticism. I'm a huge fan of the Berkeley security team and have been amazed by their accomplishments. Amazing work has been done. But even so, and without being hostile in any way at all, I do think we all share a need to be frank with the public about what technology can, and cannot, accomplish.

BlockChains are being oversold as a trivial way to get perfect security -- not by Oasis, but by the popular technology press, which seems to have a very dim understanding of the concept. The press seems to think BlockChains are somehow "more" than secure file systems. Yet if anything, they are "less"!

Perhaps we are seeing a superposition of two elements. Clearly, we do have a community that was unaware of the concept of a tamper-proof append-only log protected using cryptography. All sorts of folks who work in government, health care, manufacturing clearly feel that this is a revelation, and offers an answer to all their security worries. And quite possibly some were actually not aware that we can use file systems in secure ways -- I'm glad they know, now, and if BlockChain helps them conceive of this, I'm all for it.

Then we have hype, driven by cryptocurrency valuations in the stratosphere, a technology press endlessly eager for the next big thing, and investors keen to make a killing. In this marketplace, I see a lot of value that companies can bring to the table, particularly ones with exciting ideas for packaging this stuff to be useful, and some of this value is very real: HyperLedger and Ethereum and related tools are genuinely powerful.

But just the same, we need to stop claiming that BlockChain is a revolutionary invention.

My worry is that by overselling the same old file systems to a naïve community of infrastructure owners, we may end up with systems that are actually less secure than what they replace. After all, a platform like Azure is actually remarkably secure today, managed professionally by a company obsessed with security (and nobody is paying me to say this, by the way!), and has such a diversity of technologies deployed that compromising it in a major way would be quite hard. If some hospital were to abandon that and jump to BlockChain as its file system, the illusion of security may have replaced much stronger real security. True, today's reality has its limits, but in fact I would be far more comfortable seeing medical records hosted on Azure or AWS, than abandoning these professional solutions in favor of a storage product that uses BlockChain as a magic wand to solve all problems.

But clearly, the current climate (especially the technology press) is prone to magical thinking, and a bit weak on just what BlockChains are, and how they work, and what their logical foundations turn out to be. And in light of that, I suppose that the amazingly high first round of Oasis investment makes a kind of sense. A bubble? Definitely. And yet all valuations are measures of market sentiment. So perhaps not an unreasonable bubble, given the modern business climate and the craze that BlockChain has engendered.