A Few Thoughts on Distributed Computing: C++ just grew up, and wow, what a change!

When I was first learning to program (longer ago than I should admit, but this would have been around 1975), I remember what a revelation it was to start working in C, after first teaching myself languages like Basic, PL/I, SNOBOL and Fortran. Back then, those were the choices.

When you code in C, you can literally visualize the machine instructions that the compiler will generate. In fact most C programmers learn to favor loops like these:

while(--n) { ... do something ... }

or even

do { ... something ... } while(--n);

because when you code in this style, you know that the compiler will take advantage of the zero/non-zero condition code left in the processor register after doing the --n decrement, and hence will generate one fewer instruction than with a for-loop, where you generally end up with an increment/decrement followed by a separate test.

I fell in love with C. What an amazing shift of perspective: a language that actually lets you control everything!

Ok, admittedly, only a computer science nut could care. But as it happens, this little example gets to the core of something important about computer systems: we tend to want our systems to be as functional as possible, but also to want them to be as minimal as possible and as fast as possible. That tension is at the core of the best systems research: you want a maximally expressive component in which every non-essential element is removed, so that the remaining logic can be executed blindingly rapidly (or with amazing levels of concurrency, or perhaps there are other goals). This sense of minimalism, though, is almost akin to an artistic aesthetic. Code in C was beautiful in a way that code in PL/I never, ever, was able to match.

Our programming languages have somewhat lost that purity. When I first learned Python and Java and then C#, their raw power impressed me enormously. Yet there is also something a little wrong, a little fast-and-loose, about these technologies: they are insanely expensive. First, they encourage enormous numbers of very small method calls, which entail pushing return addresses and arguments to the stack, doing maybe two instructions worth of work, and then popping the environment. Next, because of automated memory management, object creation is cheap and easy, yet costly, and there are unpredictable background delays when garbage collection runs. Even if you code in a style that preallocates all the objects you plan to use, these costs arise anyhow, because the runtime libraries make extensive use of them. Finally, the very powerful constructs they offer tend to make a lot of use of runtime introspection features: polymorphism, dynamic type checking, and dynamic request dispatch. All are quite expensive.

But if you are a patient person, sometimes you get to relive the past. Along came C++ 11, which has gradually morphed to C++ 14, with C++ 17 around the corner. These changes to C++ are not trivial, but they are a revelation: for the first time, I feel like I'm back in 1975 programming in C and thinking hard about how many instructions I'm asking the compiler to generate for me.

I'll start by explaining what makes the new C++ so great, but then I'll complain a little about how very hard it has become to learn and to use.

A normal program spends a remarkable amount of time pushing stuff onto the stack and popping it back off. This includes arguments passed by value or even by reference (in Java and C# a reference is like a pointer), return addresses, registers that might be overwritten, etc. Then the local frame needs to be allocated and initialized, and then your code can run. So, remember that cool C program that looked so efficient back in 1975? In fact it was spending a LOT of time copying! You can ask: how much of that copying was really needed?

In C++ these costs potentially vanish. There is a new notation for creating an alias: if a method foo() has an argument declared this way: foo(int& x), then the argument x will be captured as an alias to the integer passed in. So the compiler doesn't need to push the value of x, and it won't even need to push a pointer: it literally accesses the caller's x, which in turn could also be an alias, etc.

With a bit of effort, foo itself will expand inline, and if foo is recursive but uses some form of constant expression to decide the recursion depth or pattern, the compiler can often simulate the recursive execution and generate just the data-touching code from the unwound method.
With polymorphic method calls in Java and C#, a runtime dispatch occurs when the system needs to figure out the actual dynamic types of the variables used in a method invocation and match that to a particular entry point. In C++, you get the same behavior but the actual resolution that matches caller and callee occurs at compile time. Thus at runtime, only data-touching code runs, which is far faster.
Although C++ now has dynamic memory management, it comes from a library implementing what are called smart pointers, which are reference-counter objects that C++ creates and automatically manages: when such an object goes out of scope, the compiler calls the destructor method and it decrements the reference count on the object, and then automatically destroys the object itself once the count goes to 0. This gives a remarkable degree of control over memory allocation and deallocation, once you become familiar with the required coding style. In fact you can take full control and reach a point where no dynamic allocation or garbage collection would ever take place: you preallocate objects and keep old copies around for reuse. The libraries, unlike the ones in Java and C#, don't create objects on their own, hence the whole thing actually works.
C++ can do a tremendous amount of work at compile time, using what are called constant expression evaluation and variadic template expansions. Basically, the compile-time behavior of the language is that of a full program that you get to write, and that generates the code that will really execute at runtime. All the work of type checking occurs statically, many computations are carried out by the compiler and don't need to be performed at runtime, and you end up with very complex .h header files, but remarkably "thin" executables

So with so much fantastic control, what's not to love?

The syntax of C++ has become infuriatingly difficult to understand: a morass of angle braces and & operators and pragmas about constants and constant expressions that actually turn out to matter very much.
The variadic template layer is kind of weird: it does a syntax-directed style of recursion in which there are order-based matching operations against what often looks, at a glance, like an ambiguous set of possible matching method invocations. A misplaced comma can lead the compiler off on an infinite loop. At best, you feel that you are coding in a bizarre fusion of Haskall or O'CaML with the old C++, which honestly doesn't make for beautiful code.
As a teacher who enjoys teaching object oriented Java classes, I'm baffled by the basic question of how one would teach this language to a student. For myself, it has taken two years to start to feel comfortable with the language. I look forward to seeing a really good textbook! But I'm not holding my breath.
The language designers have been frustratingly obtuse in some ways. For example, and this is just one of many, there are situations in which one might want to do compile-time enumerations over the public members of a class, or to access the actual names the programmer used for fields and methods. For obscure religious reasons that date back to the dark ages, the standards committee has decided that these kinds of things are simply evil and must never, ever, be permitted.

Why are they needed? Well, in Derecho we have persistent template classes and would normally want to name the files that hold the persisted data using the name of the variable the data corresponds to, which would be some sort of scoped name (the compile-time qualified path) and then the variable name. No hope.

And one wants to iterate the fields in order to automatically serialize the class. Nope.

And one wants to iterate over the methods because this would permit us to do fancy kinds of type checking, like to make sure the client invoking a point-to-point method isn't accidentally invoking a multicast entry point. Sorry, guy. Not gonna happen.
The language designers also decided not to support annotations, like the @ notation in Java, or the C# [ something ] notation that you can attach to a class. This is a big loss: annotations are incredibly useful in polymorphic code, and as C++ gets more and more powerful, we get closer and closer to doing compile-time polymorphic logic. So why not give us the whole shebang?
There isn't any way to stash information collected at compile time. So for example, one could imagine using variadic templates to form a list of methods that have some nice property, such as being read-only or being multicast entry points. That list would be a constexpr: generated at compile time. But they just don't let you do this. No obvious reason for the limitation.

I could complain at greater length, but I'll stop with the remark that even with its many limitations, and even with the practical problem that compilers have uneven support for the C++ 17 features, the language on a tear and this is clearly the one language that will someday rule them all.

In our work on Derecho, we've become C++ nuts (lead by Matt Milano, an absolutely gifted C++ evangelist and the ultimate C++ nut). Our API is not trivial and the implementation is quite subtle. Yet under Matt's guidance, 95% of that implementation involves compile-time logic. The 5% left that has to be done at runtime is incredibly "thin" code, like the while(--n) { ... } loop: code that genuinely has to execute at that point in time, that actually touches data. And because this code tends to be very efficient and very highly optimized, just a few instructions often suffice. All of Derecho may be just instructions away from the RDMA NIC taking control and moving bytes.

Wow... I feel like it is 1975 again (except that there is no way to fool myself into believing that I'm 20 years old. Sigh) But at least we can finally write efficient systems code, after decades of writing systems code in languages that felt like plush goose-down quilts: thick and soft, and nowhere near reality...

A Few Thoughts on Distributed Computing

Wednesday, 15 February 2017

C++ just grew up, and wow, what a change!

No comments:

Post a Comment