In the past few blog entries, I focused on issues of "specification": to what extent are we able to specify the behavior our systems guarantee, the behavior we expect clients to respect, or other kinds of assumptions we might be making about the setting, peak loads, and so forth?
Today I thought I might ask a closely related question: could formal theorem-proving tools, like the Ivy prover used by my friends at Tel Aviv University, or the NuPRL system that Bob Constable's group here at Cornell created, be used to "extract" a specification, including any assumptions or constraints the solution imposes on applications that use it?
I should mention that while these two theorem proving systems use high level logic languages, not all such systems are quite so mathematical in approach. If you prefer Java as your coding language, Microsoft's Dafny would be another system where you could pose my question. In fact, Dafny really embodies the main idea behind this blog posting, but approaches it the question in a slightly different way. In Dafny, when you specify a component of a system, you encode any preconditions or invariants as assertions in a special notation that goes into the documentation for the associated module and methods. Later, when compiling your code, Dafny requires you to prove that the method actually respects the constraints that you expressed this way. So Dafny doesn't automate specification, per se, but it does enforce that a method must respect its specification, and it can sometimes automatically prove that this property holds. If so, you won't need to do anything more at all. Further, if your users later violate an assumption, Dafny would refuse to compile their code, ensuring that your module will be used only as intended.
The problem of automatically figuring out what these specifications should include looks super hard, but is fascinating to me: an ideal topic for research, if you have the math and type-theory background (or are willing to learn: nobody is born knowing that stuff...)
So here's the setting. Suppose I have a method for computing some function, F(x), such as "it computes x/2". Now, assume that I decide to run it purely with integer arithmetic. When will method F() be correct? Well, for this simple example, it will work for even values of x. Had I been using Dafny, it might refuse to compile my code without some help, such as "assert even(x)" as a precondition (the definition of even(x) could be something like x%2 == 0, for an integer x).
So Dafny wouldn't necessarily compile code with an error in it, but it often wouldn't know why the code isn't correct. The question becomes: could a theorem prover have figured this out for me? With this very simple example, the answer is "sort of." Take a theorem prover, and tell it that F is a "lambda" that maps from x to x/2. Now define the argument x to be of type integer, and tell it that the arithmetic operator is of type (integer, integer) => integer.
If you tell NuPRL or Ivy to "verify" this code, it should fail, just as Dafny should fail, giving an example such as "x=1", and meaning "with 1 as its input, F(1) gives an undefined result". Another possibility, for situations where the prover understands that integer division rounds down, would to try and verify the invariant that F(X)*2 = X. This should fail, with the example F(1)=0.
So now we know that our function doesn't "work" as intended. What we really want is for the system to suggest "Have you considered requiring that X must be even?"
So this would be an automated derivation of an assumption, or of a constraint on the inputs. (For this blog posting, we can assume that assumption and constraint are synonyms, although there are some PL communities that use the terms in slightly different ways.)
How hard is to automate the inference of assumptions? With current programming languages, it can be very difficult, for several reasons. A core issue is specificity and naturalness. Notice first that because false implies anything, there is a very simple universal constraint, but it renders the problem totally useless. This rules out any kind of simple way of solving the problem. So we want a non-trivial solution.
Next, consider that specifications can involve multiple kinds of formalisms: multiple areas of mathematics. For example, over the course of my career I've worked with systems that have real-time guarantees, protocols in which you typically specify a convergence guarantee, more classic temporal logic properties, such as consistency for state-machine replication. Years ago, one of my students (Kryzs Ostrowski) made the point that there aren't any programming languages in which all three of these "models" would easily fit. He argued for a different form of type checking, but setting that to the side, the bottom line is that if you code in a language where temporal logic is relatively easy to express (like Dafny), you probably won't be able to specify real-time properties, or probabilistic convergence properties. Each style of property has a different most natural style of formalism, and they don't necessary compose in obvious ways.
The same goes for having a system automatically infer specifications: the odds are that any given formalism will be expressive enough to capture and perhaps infer at least some kinds of specifications in its own formalism, but that it won't even be able to "understand" properties that are outside its normal scope of coverage.
This makes specification of the automatic-specification-derivation question itself tricky. A really good derivation of assumptions would factor out each distinct element, so that we would have a small list of them, each "speaking" about a distinct aspect of the runtime environment. But a theorem prover, lacking a human notion of what it means to be distinct and natural, is much more likely to extract some single messy condition for safety, and perhaps a second messy condition for liveness ("total" correctness can be viewed as having two aspects: "nothing bad happens", and "if condition X holds, then something good will eventually happen").
Of course, if English was useful for describing this goal, it would have been much easier. We basically want a normal human reader of the specification to be given a list of preconditions (requirements the caller must respect), each separately intuitive and sensible: "The program will work if (1) The inputs are even numbers. (2) No input is negative. (3) No input exceeds 2^16-1..." So getting from some single condition for correctness, if we had one, into a cleanly refactored form that feels right to human users will be a puzzle. Then we would want to specify post-conditions, again in an easily understood form, and invariants.
But while these easy cases are probably very feasible, any general approach to extracting a specification purely in temporal logic is guaranteed to be hard. Taking a favorite classic paper: I would love to see someone explore extracting the specification of uniform agreement (consensus). In principle, doing so should "derive" the famous Chandra and Toueg <>W conditions: consensus with N>2 members, of which at most one is faulty, can be achieved if the failure detection input has the property that eventually, it permanently stabilizes in a state where some correct process is never again suspected of being faulty by any correct process. A really powerful system might even be able to give a result for the more general case of N members and T failures, and derive that N > 2T+1.
If you follow this blog, you've noticed that I'm obsessed with our new Derecho system. For a system like Derecho, what could a tool of this kind derive? We think we know the actual specification (we understand our own protocols well), but it would be fun to see a surprising "angle" that we haven't considered. And you can't rule out such a thing happening: Derecho is a complex engineered piece of software, and even if we think we understand the conditions for progress and safety, perhaps there are aspects of the environment that Derecho depends upon, and that we haven't thought much about. For example, what are we assuming about the network layer (RDMA, in the case of Derecho)? What are we assuming about the SSD storage units? Are there any timing requirements?