A Few Thoughts on Distributed Computing: February 2017

Friday, 24 February 2017

On systematic errors in complex computer programs

Society as a whole, including many computing professionals, seems to assume that mistakes made by unsupervised deep learning and other forms of machine-learning will be like Heisenbugs: rare, random, and controllable through a more professional code development and testing process. This belief underlies a growing tendency to trust computing systems that embody unknown (indeed, unknowable) machine-learned behaviors.

Why is this a concern?

In many parts of computer science, there is a belief in "ultimate correctness": a perception that if we merely formalize our goals and our algorithms and successfully carry out a proof that our code accomplishes the goals, then we will have created a correct artifact. If the goals covered a set of safety objectives, then the artifact should be safe to use, up to the quality of our specification. Highly professional development practices and extensive testing strengthen our confidence; model-checking or other forms of machine-checked proofs that run on the real software carry this confidence to the point at which little more can be said, if you trust the specification.

Yet we also know how unrealistic such a perspective can be. Typical computing systems depend upon tens or hundreds of millions of lines of complex code, spread over multiple modules, perhaps including firmware programs downloaded into the hardware. All of this might then be deployed onto the cloud, and hence dependent on the Internet, and on the cloud data-center's thousands of computers. The normal behavior of such programs resides somewhere in the cross-product of the individual states of the component elements: an impossibly large space of possible configurations.

While we can certainly debug systems through thorough testing, software is never even close to flawless: by some estimates, a bug might lurk in every few tens or hundreds of lines of logic. The compiler and the operating system and various helper programs are probably buggy too. Testing helps us gain confidence that these latent bugs are rarely exercised under production conditions, not that the code is genuinely correct.

Beyond what is testable we enter the realm of Heisen-behaviors: irreproducible oddities that can be impossible to explain: perhaps caused by unexpected scheduling effects, or by cosmic rays that flip bits, or by tiny hardware glitches. The hardware itself is often better modelled as being probabilistic than deterministic: 1 + 1 certainly should equal 2, but perhaps sometimes the answer comes out as 0, or as 3. Obviously, frequent problems of that sort would make a machine sufficiently unreliable that we would probably repair or replace it. But an event happening perhaps once a week? Such problems often pass unnoticed.

Thus, ultimate correctness is elusive; we know this, and we've accommodated to the limitations of computing systems by doing our best to fully specify systems, holding semi-adversarial code reviews aimed at finding design bugs, employing clean-room development practices, then red-team testing, then acceptance testing, then integration testing. The process works surprisingly well, although patches and other upgrades commonly introduce new problems, and adapting a stable existing system to operate on new hardware or in a new setting can reveal surprising unnoticed issues that were concealed or even compensated for by the prior pattern of use, or by mechanisms that evolved specifically for that purpose.

The puzzle with deep learning and other forms of unsupervised or semi-supervised training is that we create systems that lack a true specification. Instead, they have a self-defined behavioral goal: reinforcement learning trains a system to respond to situations resembling ones it has seen before by repeating whatever response dominated in the training data. In effect: "when the traffic light turns yellow, drive very fast."

Thus we have a kind of autonomously-learned specification, and because the specification is extracted automatically by training against a data set, the learned model is inherently shaped by the content of the data set.

Train such a system on a language sample in which plurals always end in "s", and it won't realize that "cattle" and "calamari" are plural. Train it on images in which all the terrorists have dark hair and complexions, and the system will learn that anyone with dark hair or skin is a potential threat. Teach it to drive in California, where every intersection either has stop signs on one or both streets, or has a traffic signal, and it won't understand how to drive in Europe, where many regions use a "priority to the right" model, whereby incoming traffic (even from a small street) has priority over any traffic from the left (even if from a major road).

Machine learning systems trained in this way conflate correlation with causation. In contrast, human learning teases out causal explanations from examples. The resulting knowledge is different from a knowledge model learned by training today's machine learning technologies, no matter how impressive the machine learning system's ability to do pattern matching.

Human knowledge also understands time, and understands that behavior must evolve over time. Stephen Gould often wrote about being diagnosed as a young adult with a fatal circulatory cancer. Medical statistics of the period gave him a life expectancy of no more than a few months, perhaps a year at best. But as it happened, a new medication proved to be a true magic bullet: he was cured. The large-population statistics were based on prior treatments and hence not predictive of the outcomes for those who received this new treatment. The story resonated in Gould's case because in his academic life, he studied "punctuated equilibria", which are situations in which a population that has been relatively static suddenly evolves in dramatic ways: often, because of some significant change in the environment. Which is precisely he point.

Those who fail to learn from the past are doomed to repeat it. But those who fail to appreciate that the past may not predict the future are also doomed. Genuine wisdom comes not from raw knowledge, but also from the ability to reason about novel situations in robust ways.

Machine learning systems tend to learn a single set of models at a time. They squeeze everything into a limited collection of models, which blurs information if the system lacks a needed category: "drives on the left", or "uses social networking apps". Humans create models, revise models, and are constantly on the lookout for exceptions. "Is that really a pile of leaves, or has the cheetah realized it can hide in a pile of leaves? It never did that before. Clever cheetah!" Such insights once were of life-or-death importance.

Today, a new element enters the mix: systematic error in which a system is programmed to learn a pattern, but overgeneralizes and consequently behaves incorrectly every time a situation arises that exercises the erroneous generalization. Systematic error is counterintuitive, and perhaps this explains our seeming inability to recognize the risk: viewing artificially intelligent systems as mirrors of ourselves, we are blind to the idea that actually, they can exhibit bizarre and very non-random misbehavior. Indeed, it is in the nature of this form of machine learning to misbehave in unintuitive ways!

My concern is this: while we've learned to create robust solutions from somewhat unreliable components, little of what we know about reliability extends to this new world of machine-learning components that can embody systematic error, model inadequacies, or an inability to adapt and learn as conditions evolve. This exposes us to a wide range of new failure modalities never before seen, and that could challenge the industry and the computer science community to overcome. We lack systematic ways to recognize and respond to these new kinds of systematic flaws.

Systematic error also creates new and worrying attack surfaces that hackers and others might exploit. Knowing how a machine learning system is trained, a terrorist might circulate some photoshopped images of him or herself with very pale makeup and light brown or blond hair, to bind other biometrics that are harder to change (like fingerprints, corneal patterns) with interpretations suggesting "not a threat". Knowing how a self-driving car makes decisions, a hacker might trick it into driving into a pylon.

Welcome to the new world of complex systems with inadequate artificial intelligences. The public places far too much confidence in these systems, and our research community has been far too complacent. We need to open our eyes to the risks, and to teach the public about them, too.

Wednesday, 15 February 2017

C++ just grew up, and wow, what a change!

When I was first learning to program (longer ago than I should admit, but this would have been around 1975), I remember what a revelation it was to start working in C, after first teaching myself languages like Basic, PL/I, SNOBOL and Fortran. Back then, those were the choices.

When you code in C, you can literally visualize the machine instructions that the compiler will generate. In fact most C programmers learn to favor loops like these:

while(--n) { ... do something ... }

or even

do { ... something ... } while(--n);

because when you code in this style, you know that the compiler will take advantage of the zero/non-zero condition code left in the processor register after doing the --n decrement, and hence will generate one fewer instruction than with a for-loop, where you generally end up with an increment/decrement followed by a separate test.

I fell in love with C. What an amazing shift of perspective: a language that actually lets you control everything!

Ok, admittedly, only a computer science nut could care. But as it happens, this little example gets to the core of something important about computer systems: we tend to want our systems to be as functional as possible, but also to want them to be as minimal as possible and as fast as possible. That tension is at the core of the best systems research: you want a maximally expressive component in which every non-essential element is removed, so that the remaining logic can be executed blindingly rapidly (or with amazing levels of concurrency, or perhaps there are other goals). This sense of minimalism, though, is almost akin to an artistic aesthetic. Code in C was beautiful in a way that code in PL/I never, ever, was able to match.

Our programming languages have somewhat lost that purity. When I first learned Python and Java and then C#, their raw power impressed me enormously. Yet there is also something a little wrong, a little fast-and-loose, about these technologies: they are insanely expensive. First, they encourage enormous numbers of very small method calls, which entail pushing return addresses and arguments to the stack, doing maybe two instructions worth of work, and then popping the environment. Next, because of automated memory management, object creation is cheap and easy, yet costly, and there are unpredictable background delays when garbage collection runs. Even if you code in a style that preallocates all the objects you plan to use, these costs arise anyhow, because the runtime libraries make extensive use of them. Finally, the very powerful constructs they offer tend to make a lot of use of runtime introspection features: polymorphism, dynamic type checking, and dynamic request dispatch. All are quite expensive.

But if you are a patient person, sometimes you get to relive the past. Along came C++ 11, which has gradually morphed to C++ 14, with C++ 17 around the corner. These changes to C++ are not trivial, but they are a revelation: for the first time, I feel like I'm back in 1975 programming in C and thinking hard about how many instructions I'm asking the compiler to generate for me.

I'll start by explaining what makes the new C++ so great, but then I'll complain a little about how very hard it has become to learn and to use.

A normal program spends a remarkable amount of time pushing stuff onto the stack and popping it back off. This includes arguments passed by value or even by reference (in Java and C# a reference is like a pointer), return addresses, registers that might be overwritten, etc. Then the local frame needs to be allocated and initialized, and then your code can run. So, remember that cool C program that looked so efficient back in 1975? In fact it was spending a LOT of time copying! You can ask: how much of that copying was really needed?

In C++ these costs potentially vanish. There is a new notation for creating an alias: if a method foo() has an argument declared this way: foo(int& x), then the argument x will be captured as an alias to the integer passed in. So the compiler doesn't need to push the value of x, and it won't even need to push a pointer: it literally accesses the caller's x, which in turn could also be an alias, etc.

With a bit of effort, foo itself will expand inline, and if foo is recursive but uses some form of constant expression to decide the recursion depth or pattern, the compiler can often simulate the recursive execution and generate just the data-touching code from the unwound method.
With polymorphic method calls in Java and C#, a runtime dispatch occurs when the system needs to figure out the actual dynamic types of the variables used in a method invocation and match that to a particular entry point. In C++, you get the same behavior but the actual resolution that matches caller and callee occurs at compile time. Thus at runtime, only data-touching code runs, which is far faster.
Although C++ now has dynamic memory management, it comes from a library implementing what are called smart pointers, which are reference-counter objects that C++ creates and automatically manages: when such an object goes out of scope, the compiler calls the destructor method and it decrements the reference count on the object, and then automatically destroys the object itself once the count goes to 0. This gives a remarkable degree of control over memory allocation and deallocation, once you become familiar with the required coding style. In fact you can take full control and reach a point where no dynamic allocation or garbage collection would ever take place: you preallocate objects and keep old copies around for reuse. The libraries, unlike the ones in Java and C#, don't create objects on their own, hence the whole thing actually works.
C++ can do a tremendous amount of work at compile time, using what are called constant expression evaluation and variadic template expansions. Basically, the compile-time behavior of the language is that of a full program that you get to write, and that generates the code that will really execute at runtime. All the work of type checking occurs statically, many computations are carried out by the compiler and don't need to be performed at runtime, and you end up with very complex .h header files, but remarkably "thin" executables

So with so much fantastic control, what's not to love?

The syntax of C++ has become infuriatingly difficult to understand: a morass of angle braces and & operators and pragmas about constants and constant expressions that actually turn out to matter very much.
The variadic template layer is kind of weird: it does a syntax-directed style of recursion in which there are order-based matching operations against what often looks, at a glance, like an ambiguous set of possible matching method invocations. A misplaced comma can lead the compiler off on an infinite loop. At best, you feel that you are coding in a bizarre fusion of Haskall or O'CaML with the old C++, which honestly doesn't make for beautiful code.
As a teacher who enjoys teaching object oriented Java classes, I'm baffled by the basic question of how one would teach this language to a student. For myself, it has taken two years to start to feel comfortable with the language. I look forward to seeing a really good textbook! But I'm not holding my breath.
The language designers have been frustratingly obtuse in some ways. For example, and this is just one of many, there are situations in which one might want to do compile-time enumerations over the public members of a class, or to access the actual names the programmer used for fields and methods. For obscure religious reasons that date back to the dark ages, the standards committee has decided that these kinds of things are simply evil and must never, ever, be permitted.

Why are they needed? Well, in Derecho we have persistent template classes and would normally want to name the files that hold the persisted data using the name of the variable the data corresponds to, which would be some sort of scoped name (the compile-time qualified path) and then the variable name. No hope.

And one wants to iterate the fields in order to automatically serialize the class. Nope.

And one wants to iterate over the methods because this would permit us to do fancy kinds of type checking, like to make sure the client invoking a point-to-point method isn't accidentally invoking a multicast entry point. Sorry, guy. Not gonna happen.
The language designers also decided not to support annotations, like the @ notation in Java, or the C# [ something ] notation that you can attach to a class. This is a big loss: annotations are incredibly useful in polymorphic code, and as C++ gets more and more powerful, we get closer and closer to doing compile-time polymorphic logic. So why not give us the whole shebang?
There isn't any way to stash information collected at compile time. So for example, one could imagine using variadic templates to form a list of methods that have some nice property, such as being read-only or being multicast entry points. That list would be a constexpr: generated at compile time. But they just don't let you do this. No obvious reason for the limitation.

I could complain at greater length, but I'll stop with the remark that even with its many limitations, and even with the practical problem that compilers have uneven support for the C++ 17 features, the language on a tear and this is clearly the one language that will someday rule them all.

In our work on Derecho, we've become C++ nuts (lead by Matt Milano, an absolutely gifted C++ evangelist and the ultimate C++ nut). Our API is not trivial and the implementation is quite subtle. Yet under Matt's guidance, 95% of that implementation involves compile-time logic. The 5% left that has to be done at runtime is incredibly "thin" code, like the while(--n) { ... } loop: code that genuinely has to execute at that point in time, that actually touches data. And because this code tends to be very efficient and very highly optimized, just a few instructions often suffice. All of Derecho may be just instructions away from the RDMA NIC taking control and moving bytes.

Wow... I feel like it is 1975 again (except that there is no way to fool myself into believing that I'm 20 years old. Sigh) But at least we can finally write efficient systems code, after decades of writing systems code in languages that felt like plush goose-down quilts: thick and soft, and nowhere near reality...

Systems engineering viewed as a science

At Cornell University, where I've done my research and taught since 1982, we've always had a reputation for being a top theory department, and because Cornell is relatively small compared to some of our peers, and because many of my colleagues are famous for rigorous systems theory, this reputation includes systems. I wouldn't call it a bad thing: I actually like to specify protocols as carefully as possible and to tease out a correctness argument, and my work has often benefitted from rigor.

And yet I'm reminded of something Roger Needham loved to stress. You've probably heard the story, but Roger headed the most widely known systems group in Europe for many years, at Cambridge University, and he went on to found the Microsoft Research Laboratory in Cambridge.

Roger's health finally failed, and his colleagues gathered to celebrate Roger's accomplishments. This came near the end of his life, when the cancer he was suffering from had left him very frail. As a result, Roger was unable to physically attend, but he did send a video, and in this video we see him in a wheelchair holding an engineer's hard hat on his lap. He starts out by saying a few words about his career (which included a wide range of work, some very theoretical, some very practical). And then he puts the hat on. Roger looks directly at the camera and says that he hasn't very long now, and wants people to remember him this way: wearing a hard hat, very much the engineer.

For Roger, computer systems research is an engineering discipline first: that our primary obligation is to really build the things we invent, and to build production quality software that people will really want to use. Roger loved ideas, but for him, an unimplemented idea was inferior to a working one, and a working system that people use was the gold standard. Throughout his career, Roger repeatedly acknowledged the value of rigor and used theoretical tools to achieve rigor. But the key thing is that for him, theory in the systems arena was a tool, secondary to the actual engineering work itself. The real value resided in the artifact: the working system.

For me this was one of the most iconic images in my entire career: Roger wearing that hat and underscoring that we are at our best when we build things of high value: useful things, beautifully designed concrete implementations. And there is a genuine esthetic of systems creation too: a meaningful sense in which great systems are beautiful things.

Today, I see a huge need to elevate our view of systems engineering: rather than thinking of it as mere implementation work, we need to begin to appreciate the science and beauty of great systems work. There is an inherent value to this act of creation: building really good software, that solves some really hard problem, and having it work, and be used. We're at our best when we create artifacts.

When I visit people at the French air traffic control agency, I'm kind of awed that the software we built for them in 1990 is still working today without a single unmanaged failure, in 27 years. The NYSE floor trading system I developed ran the show for ten years, and there was never a single disruption to trading in all that time. Every time you install Oracle, watch closely: in the middle of the install script Oracle sets up my old Isis toolkit system as its network management tool. And this has worked for nearly 30 years. Nobody phones me to ask how to fix obscure bugs. (Of course, once Derecho is finally out there, I bet many will call to ask how to migrate to it).

We built these really difficult engineered infrastructures with incredibly demanding specifications, came up with a beautiful and elegant model (virtual synchrony, which turns out to be a variation on the Paxos state machine replication model), and it actually worked! This is as good as it gets.

Our field continues to struggle with the tension between its mathematically-oriented theory side and its roots in systems engineering. One sees this again and again. Papers are brushed to the side by conferences because the contributions are primarily about sound engineering and great performance: "minor and incremental" aspects in the eyes of some PC member (who has probably never coded a large C++ program in his life)! Funding programs somehow assume that the massive data collection infrastructures that will be needed as hosting environments for their fancy new machine learning technologies are trivial and will just build themselves. You write a reference letter for a student, and back comes an email query: yes, ok, the student is a gifted distributed systems engineer. But do they have a talent for theory? And if not, where's the originality? I've become accustomed to that sinking feeling of needing to translate my enthusiasm for systems engineering to make sense to a person who just doesn't understand the creativity and originality that the best systems research demands.

So here's my little challenge to the community: we need to start going out of our way to seek out and accept papers on the best systems, and to deliberately validate and celebrate the systems engineering side of the field as we do so. Over time, we need to recreate a culture of artifacts: by deliberate choice, we need to tilt the field back towards valuing systems to a greater degree. Roger was right: we're at our best when we wear those hard hats.

A few concrete suggestions:

Let's start to write papers that innovate by revealing clever ways to build amazing things.
Those of us in a position to do so should press program committees to set aside entire sessions to highlight real systems: practical accomplishments that highlight the underlying science of systems engineering.
We should be giving awards to the best academic systems research work. Somehow, over time, prizes like the ACM software systems prize started going purely to massive projects done by big teams in industry (which is fine, if ACM wants to orient the prize that way). But we also need prizes to recognize the best systems built in academic and research settings.

Why do these things? Because we're at our best when we create working code, and when we teach others to appreciate the elegance of the code itself: the science behind the engineering. Systems building is what we do, and we should embrace that underlying truth.

Thursday, 9 February 2017

Programming in the basement

One thing I've discovered during my sabbatical is that nearly every hardware technology you'll find in future computing systems is independently programmable:

The multi-core computer itself.
The Network Interface Cards (NICs) attached to it. Companies like Mellanox are dropping substantial amounts of computing power right into the NIC, although not every NIC will be fully capable. But at least some NICs are going to have substantial multicore chips right onboard in the NIC, with non-trivial amounts of memory and the ability to run a full operating system like Linux or QNX (I'm listing two examples that have widely used "embedded" versions, designed to run in a little co-processor -- these are situations where you wouldn't expect to find a display or a keyboard).
The storage subsystem. We're seeing a huge growth in the amount of data that can be held by a NAND (aka SSD or flash) disk. Rotational disks are even larger. These devices tend to have onboard control units, to manage the medium, and many have spare capacity. Using that capability makes sense: because of the size, it is less and less practical for the host computer to access the full amount of data, even for tasks like checking file system integrity and making backups. Moreover, modern machine-learning applications often want various forms of data "sketches" or "skims" -- random samples, or statistical objects that describe the data in concise ways. Since customers might find it useful to be able to ask the storage device itself to compute these kinds of things, or to shuffle data around for efficient access, create backups, etc, manufacturers are playing with the idea of augmenting the control units with extra cores that could be programmed by pushing logic right into the lower layers of the storage hierarchy. Then a machine learning system could use that feature to "send" its sketch algorithm down into the storage layer, at which point it would have an augmented storage system that has a native ability to compute sketches, and similarly for other low-level tasks that might be useful.
Attached co-processors such as NetFPGA devices, GPU clusters, systolic array units.
The visualization hardware. Of course we've been familiar with graphical co-processors for a long time, but the sophistication available within the display device is growing by leaps and bounds.
Security hardware. Here the state of the art would be devices like the Intel SGX chip, which can create secure enclaves: the customer can ship an encrypted virtual image to a cloud or some other less-trusted data center operator, where computation will occur remotely, in standard cloud-computing style. But with SGX, the cloud operator can't peer into the executing enclave, even though the cloud is providing the compute cycles and storage for the customer's job. Think of SGX as a way to create a virtually private cloud that requires no trust in the cloud owner/operator at all! Devices like SGX have substantial programmability.
Other types of peripherals. Not only does your line printer and your WiFi router have a computer in it, it is entirely possible that refrigerator does too, and your radio, and maybe your microwave oven. The oven itself almost definitely does. I point this out not so much because the average computer will offload computing into the oven, but more to evoke the image of something that is also the case inside your friendly neighborhood cloud computing data center: the switches and routers are programmable and could host fairly substantial programs, the power supplies and air conditioner is programmable, the wireless adaptor on the computer itself is a software radio, meaning that it is programmable and can dynamically adapt the kinds of signals it uses for communication, and the list just goes on and on.

The puzzle is that today, none of this is at all easy to do. If I look at some cutting-edge task such as the navigation system of a smart car (sure, I'm not wild about this idea, but I'm a pragmatist too: other people love the concept and even if it stumbles, some amazing technologies will get created along the way), you can see how two kinds of questions arise:

What tasks need to be carried out?
Where should they run?

I could break this down in concrete ways for specific examples, but rather than do that, I've been trying to distill my own thinking into a few design principles. Here's what I've come up with; please jump in and share your own thoughts!

The first question to ask is: what's the killer reason to do this? If it will require a semi-heroic effort to move your data visualization pipeline into a GPU cluster or something, that barrier to actually creating the new code has got to be part of the cost-benefit analysis. So there needs to be an incredibly strong advantage to be gained by moving the code to that place.
How hard is it to create this code? Without wanting to disparage the people who invented GPU co-processors and NetFPGA, I'll just observe that these kinds of devices are remarkably difficult to program. Normal code can't just be cross-compiled to run on them, and designing the code that actually can run on them is often more "like" hardware design than software design. You often need to do the development and debugging in simulators because the real hardware can be so difficult to even access. Then there is a whole magical incantation required to load your program into the unit, and to get the unit to start to process a data stream. So while each of these steps may be solvable, and might even be easy for an expert with years of experience, we aren't yet in a world where the average student could make a sudden decision to try such a thing out, just to see how it will work, and have much hope of success.
Will they compose properly? While visiting at Microsoft last fall, I was chatting with someone who knows a lot about RDMA, and a lot about storage systems. This guy pointed out that when an RDMA transfer completes and the application is notified, while you do know that software in the end-point node will see the new data, you do not know for sure that hardware such as disks or visualization pipelines would correctly see it: they can easily have their own caches or pipelines and those could have stale data in them, from activity that was underway just as the RDMA was performed. You can call these concurrency bugs: problems caused by operating the various components side by side, in parallel, but without having any kind of synchronization barriers available. For example, the RDMA transfer layer currently lacks a way to just tell the visualization pipeline: "New data was just DMA'ed into address range X-Y". Often, short of doing a device reset, the devices we deal with just don't have a nice way to flush caches and pipelines, and a reset might leave the unit with a cold cache: a very costly way to ensure consistency. So the insight here is that until barriers and flush mechanisms standardize, when you shift computation around you run into this huge risk that the benefit will be swamped by buggy behaviors, and that the very blunt-edged options for flushing those concurrent pipelines and caches will steal all your performance opportunity!
How stable and self-managed will the resulting solution be? The world has little tolerance for fragile technologies that easily break or crash or might sometimes malfunction. So if you believe that at the end of the day, people in our business are supposed to produce product-quality technologies, you want to be asking what the mundane "events" your solution will experience might look like (I have in mind things like data corruption caused by static or other effects in a self-driving car, or software crashes caused by Heisenbugs). Are these technologies capable of dusting themselves off and restarting into a sensible state? Can they be combined cleanly with other components?
Do these devices fit into today's highly virtualized multi-tenancy settings? For example, it is very easy for MPI to leverage RDMA because MPI typically runs on bare metal: the endpoint system owns some set of cores and the associated memory, and there is no major concern about contention. Security worries are mostly not relevant, and everything is scheduled in nearly simultaneous ways. Move that same computing style into a cloud setting and suddenly MPI would have to cope with erratic delays, unpredictable scheduling, competition for the devices, QoS shaping actions by the runtime system, paging and other forms of virtualization, enterprise VLAN infrastructures to support private cloud computing models, etc. Suddenly, what was easy in the HPC world (a very, very, expensive world) becomes quite hard.

In terms of actual benefits, the kinds of things that come to mind are these:

Special kinds of hardware accelerators. This is the obvious benefit. You also gain access to hardware functionality that the device API might not normally expose: perhaps because it could be misused, or perhaps because the API is somehow locked down by standards. These functionalities can seem like surreal superpowers when you consider how fast a specialized unit can be, compared to a general purpose program doing the same thing on a standard architecture. So we have these faster-than-light technology options, but getting to them involves crossing the threshold to some other weird dimension where everyone speaks an alien dialect and nothing looks at all familiar... (Sorry for the geeky analogy!)
Low latency. If an RDMA NIC sees some interesting event and can react, right in the NIC, you obviously get the lowest possible delays. By the time that same event has worked its way up to where the application on the end node can see it, quite a lot of time will have elapsed.
Concurrency. While concurrency creates problems, like the ones listed above, by offloading a task into a co-processor we also insulate that task from scheduling delays and other disruptions.
Secret knowledge. The network knows its own topology, and knows about overall loads and QoS traffic shaping policies currently in effect. One could draw on that information to optimize the next layer (for example, if Derecho were running in a NIC, it could design its data plane to optimize the flow relative to data center topology objectives. At the end-user layer, that sort of topology data is generally not available, because data center operators worry that users could take advantage of it to game the scheduler in disruptive ways). When designing systems to scale, this kind of information could really be a big win.
Fault-isolation. If you are building a highly robust subsystem that will play some form of critical role, by shifting out of the very chaotic and perhaps hostile end-user world, you may be in a position to protect your logic from many kinds of failures or attacks.
Security. Beyond fault-isolation, there might be functionality we would want to implement that somehow "needs" to look at data flows from multiple users or from distinct security domains. If we move that functionality into a co-processor and vet it carefully for security flaws, we might feel less at risk than if the same functionality were running in the OS layer, where just by allowing it to see that cross-user information potentially breaks a protection boundary.

I'll stop here, because (as with many emerging opportunities), the real answers are completely unknown. Someday operating system textbooks will distill the lessons learned into pithy aphorisms, but for now, the bottom line is that we'll need engineering experience to begin to really appreciate the tradeoffs. I have no doubt at all that the opportunities we are looking at here are absolutely incredible, for some kinds of applications and some kinds of products. The puzzle is to sort out the big wins from the cases we might be wiser just not pursuing, namely the ones where the benefits will be eaten up by some form of overhead, or where creating the solution is just impossibly difficult, or where the result will work amazingly well, but only in the laboratory -- where it will be too fragile or too difficult to configure, or to reconfigure when something fails or needs to adapt. Maybe we will need a completely new kind of operating system for "basement programming." But even if such steps are required before we can fully leverage the opportunity, some of those opportunities will be incredible, and it will be a blast figuring out which ones offer the biggest payoff for the least effort!

Tuesday, 7 February 2017

Big Infrastructure

It surprises me that for all the talk of big data, and the huge investments rushing to capture niches in the big data space, so little attention is given to the infrastructure that will need to move that data.

Today's internet is a mostly-wired infrastructure paid for by the owners of the end-point systems, be those consumers who have a wireless router in their living room, or consumers with mobile devices that connect wirelessly to the local cell phone operator.

It isn't at all clear to me that either model can work for tomorrow's world in which the Internet of Things will be capturing much of the data and taking many kinds of actions: a world of small devices that will often be installed in the relative "wilderness" of our cities, highways, buildings and other spaces.

Consider a small sensor playing some sort of role in a smart building: perhaps it monitors temperature and humidity in a particular room. This little device will need:

A stable source of electrical power. You may be thinking "batteries" and sure, that can work, but this assumes a consumer willing to replace those batteries once a year or so. If our device wants to generate its own power, that could be possible. I've seen work on harvesting ambient EM power from background signals, and of course our unit could also have some form of optical cell to convert light to electricity. But until that question is solved, the IoT world will be a world of things you plug in, which already tells you something about "form factors" and roles.
Some idea of where it is located. If the sensor in the room is telling the house whether or not to run the A/C for that area, or the dehumidifier, clearly that sensor has to know which room it is located in. Perhaps even which way it is pointing: if you want the lighting for guests on the couch to be just right for reading, or for whatever else they may be up to, you need to be pretty smart about the room. A smart traffic light wants to know which intersection it is at, on which corner, and which way it is oriented relative to the streets. Traffic sensors will want to know these things too, even when located in the middle of a block or on a pole high above a freeway. A smart printer wants to know who is allowed to use it, and your phone wants to know which of the printers it can sense belong to your home, or to your friend's home if you happen to need to print something while hanging out elsewhere.
Some idea of role. Many sensors will be created from the same basic elements (sensors that can detect light, or take photos or video, or track moving things in the vicinity, or its own motion, or humidity, or sound, etc). These general purpose units will be cheap to manufacture in huge volume, but then in general will need to be programmed to do some specific thing that may use just parts of the functionality available.
Who's in charge. Small devices will generally operate under control of more centralized ones, which in turn might talk to still large-scale controllers. When you turn the device on, how can it figure this out?
How to network back to the boss. Even a tiny device may have many options available: Bluetooth, various WiFi options, cellular telephony options, you name it. Which should it use?

Then we run into questions of the network side of the Internet of Things. Will this really be the Internet, or will the IoT demand some other style of networking?

For some uses, today's network probably can get us there. For example, I think one of the biggest things that will be coming soon is delivery of TV-style live content, including various forms of group video conferencing, or live sports events. Those applications center on moving bytes, and our Internet service providers are pretty motivated to enhance their networks and then sell us more and better bandwidth. The technology to build data centers for replicating data rapidly inside the data center is in hand (Derecho!), and the existing last-hop options are probably fast enough, so we can get that video stream to the end user without a massive upgrade: it really could be done. But TV turns out to be a fairly easy case.

Networking for little devices isn't at all clear. Here we run into an issue like the gigabit network last leg: something enormous needs to be built, yet it currently isn't monetized, even to a limited degree.

A student I've worked with is convinced that standard networking isn't really suitable in many of the environments that the Internet of Things will run in: he points out that the people who service traffic signals aren't networking experts, and yet when traffic lights evolve into smart signaling units, those same people will probably be the ones in charge of replacing broken devices. We'll need a very different model for how the broken system reports its faults, prioritizes repairs, and how it communicates all this to the infrastructure owner and the repair crew. Since failures and environmental challenges (like ice, snow, humidity, physical damage...) are going to be common in many settings, and some devices may be in fairly remote and isolated settings, he believes that repair won't be instantaneous either: we'll want networking redundancy, using ideas like duplicate links, RAID-style data coding across the various available network pathways, and fault-tolerant routing.

Security will be an issue too. It is nice to imagine the benefits of moving towards a world of smart factories or smart transportation, but we'll want this world to have much stronger security than today's Internet (which probably wants much stronger security than it actually possesses!) But quite likely the sensors in the Internet of Things will need built-in security keys, and the architecture will be such that a given device will only accept instructions from properly authenticated, properly authorized control units. We'll want to encrypt the data sent and received, both to protect against unauthorized spying, and to ensure that the received commands really originated at a trusted source.

And then all of this will need to connect to the cloud, since the whole point of the Internet of Things will be that by using "big data" tools to learn about environments, we can often optimize them in useful ways: to waste less power, for example, or personalize the environment so that a room is a bit cool if I'm it, but a bit warmer when my sister comes for a visit (she's used to a much warmer climate and finds the places I live most of the year frigid). We might even imagine personalized advice: today is February 7 and I was swimming in the Mediterranean sea off Tel Aviv. For me, the advice would be that the water was chilly but quite swimmable. For my sister, same beach, and the message would be "deadly cold, don't even think about it!" The cloud will be the entity in this picture that builds up the personalization story: Ken likes it kind of chilly; his sister prefers it a bit on the warm side. So the Internet of Things will feed data to a cloud that in turn takes actions through both small actuators (like the traffic signal) but also through our mobile devices.

We're going to need a new technology for building and operating these systems: a new kind of "Big Infrastructure", more or less the physical counterpart of today's "Big Data" analytics and tools. But surprisingly little work seems to be underway on creating that technology, or deploying it.

Here's the very biggest puzzle of all: who is going to pay? In fact the real answer is obvious: in the end the consumer will pay, either directly in the sense of finding the technology so valuable that he or she voluntarily buys it, or indirectly, through municipal taxes and fees that pay for the smart traffic lights, or even by accepting personalized advertising. Money is invariably the key: spend enough, and most problems can be solved, and the money tends to be where the majority of the people are to be found.

Viewed this way, it becomes clear why the Internet of Things revolution is stalled, at least for the time being. The value of doing an Internet of Things is pretty clear. Once it exists, it will be an immense value generator, so if you could solve that chicken-and-egg puzzle, you'll enable a very, very lucrative new world. But until it exists, we have a version of the so-called last-mile problem: it is very easy to get gigabit networking to within a mile of everyone, maybe even to within 100 meters. Then the costs simply explore for that last leg, and while we would see a tremendous burst of innovation if we had gigabit networking into our homes, the cost of bridging that last leg is a tremendous barrier to starting to reap the benefits of the faster technology. Thus nobody actually has gigabit networking now, so the applications that would be enabled by that sort of development aren't yet being created: if you did so, nobody would buy them, because few people have a fast connection.

Earlier I said that live TV is coming, but that's because the bandwidth seems to be within the range that the existing infrastructure can manage. With a gigabit technology we could do far more, except that as far as I know, nobody has yet figured how to upgrade that last leg to support it. And the small devices version of the problem has this even greater puzzle of lacking an existing paying customer.

Where will the big infrastructure come from? If you figure this out, let me know: I'll want to be an early investor.