Nick Benton: Leviathan

Leviathan

A Theory of the State

There are two classes of programming language whose mathematical foundations have been well understood for many years. The first consists of variations on the simple imperative language of while-programs with global variables. Since the work of Floyd and Hoare nearly 40 years ago, there has been a huge amount of work on using program logics to specify and verify programs written in languages of this class, usually under the heading of "formal methods". The second comprises more-or-less pure functional languages, featuring higher-order functions and different choices of type system (including none) and different evaluation orders. These languages have been studied extensively in the "theory and semantics" community, using operational and denotational techniques.

However, essentially no useful programming language really fits into either of these classes. C, ML, Java and C# all feature dynamically allocated mutable storage and higher-order features. Really good models and reasoning principles for this combination have proved extremely elusive, despite a huge amount of research. Even proving elementary properties of first-order code with dynamically allocated pointer structures was impractically hard prior to recent work in Separation Logic. It is easy to write a set of rules which specify precisely how any individual program executes, but seems very difficult to show that two bits of program always behave the same, or to express specifications that involve the way in which bits of mutable state are kept (partially) hidden (encapsulated, abstracted) within, say, a C# object or an ML closure. Worrying about the finer details of how to establish subtle contextual equivalences involving first-class functions and private state in ML-like languages may appear to be a rather esoteric pursuit. But it is actually a central issue in understanding sequential computation. Much applied programming-related research can be seen to involve the same basic issues, approached from different application-specific perspectives. For example

Static analysis and optimizing transformations in compilation.
Certified compilation, typed assembly language and proof-carrying code.
Logics and assertion checkers for modern languages, such as JML and Spec#.
Analyses for secure information-flow.
Fancy type systems, such as refinement types, GADTs, effect systems or ownership types.
Language-based security and encapsulation in operating systems, as exemplified by systems such as Singularity.
Behavioural checking for API usage or protocol conformance.

These topics are certainly distinct. In particular, the engineering issues are utterly different in the different cases. But the conceptual problems are very similar, despite the tendency for each area to develop "just enough" of its own theory to state some kind of correctness result for each particular system proposed.

Leviathan is an umbrella name for a line of research into semantics and program logics for a range of languages, type systems, analyses and program transformations, all of which involve state. The pipe dream is to design a unified, low-level, machine-verified foundation into which many different type systems, logics and analyses may be expressed, verified, generalized and combined. Such a foundation could, for example, form the basis for a next-generation, secure, multilanguage execution environment. But attacking that head-on looks a bit of a tall order, to put it mildly. Instead, over the last few years, with various collaborators, I've been sneaking up on it and trying to get it surrounded:

N Benton and M Hyland. Traced Premonoidal Categories. ITA 2003.
One sometimes needs to define side-effecting computations in a circular manner, but in such a way that the side-effects happen only once. Examples of this phenomenon (which is called value recursion), include linking of mutually recursive modules with top-level side effects, backpatching recursive closures and constructing models of cyclic circuits. This paper looks at the categorical semantics of this kind of recursion, generalizing a well-known correspondence between traces and Conway operators in Freyd categories to a premonoidal setting, which includes state. I now recognize that the title of this one was insufficiently cuddly.
N Benton. Simple Relational Correctness Proofs for Static Analyses and Program Transformations. POPL 2004.
We show how some classical static analyses for imperative programs, and the optimizing transformations which they enable, may be expressed and proved correct using elementary logical and denotational techniques. The key ingredients are an interpretation of program properties as relations, rather than predicates, and a realization that although many program analyses are traditionally formulated in very intensional terms, the associated transformations are actually enabled by more liberal extensional properties.
We illustrate our approach with formal systems for analysing and transforming while-programs. The first is a simple type system which tracks constancy and dependency information and can be used to perform dead-code elimination, constant propagation and program slicing as well as capturing a form of secure information flow. The second is a relational version of Hoare logic, which significantly generalizes our first type system and can also justify optimizations including hoisting loop invariants. Finally we show how a simple available expression analysis and redundancy elimination transformation may be justified by translation into relational Hoare logic.
N Benton and B Leperchey. Relational Reasoning in a Nominal Semantics for Storage. TLCA 2005.
We give a monadic semantics in the category of FM-cpos to a higher-order CBV language with recursion and dynamically allocated mutable references that may store both ground data and the addresses of other references, but not functions. This model is adequate, though far from fully abstract. We then develop a relational reasoning principle over the denotational model, and show how it may be used to establish various contextual equivalences involving allocation and encapsulation of store.
N Benton. A Typed, Compositional Logic for a Stack-Based Abstract Machine. APLAS 2005.
We define a compositional program logic in the style of Floyd and Hoare for a simple, typed, stack-based abstract machine with unstructured control flow, global variables and mutually recursive procedure calls. Notable features of the logic include a careful treatment of auxiliary variables and quantification and the use of substructural typing to permit local, modular reasoning about program fragments. Semantic soundness is established using an interpretation of types and assertions defined by orthogonality with respect to sets of contexts.
N Benton. Abstracting Allocation: The New new Thing. CSL 2006.
We introduce a Floyd-Hoare-style framework for specification and verification of machine code programs, based on relational parametricity (rather than unary predicates) and using both step-indexing and a novel form of separation structure. This yields compositional, descriptive and extensional reasoning principles for many features of low-level sequential computation: independence, ownership transfer, unstructured control flow, first-class code pointers and address arithmetic. We demonstrate how to specify and verify the implementation of a simple memory manager and, independently, its clients in this style. The work has been fully machine-checked within the Coq proof assistant.
N Benton, A Kennedy, M Hofmann and L Beringer. Reading, Writing and Relations: Towards Extensional Semantics for Effect Analyses. APLAS 2006.
We show how to give an elementary semantics to an effect system tracking read and write effects by using relations over a standard extensional semantics for the original language. The semantics establishes the soundness of both the analysis and its use in effect-based program transformations.
N Benton and P Buchlovsky. Semantics of an Effect Analysis for Exceptions. TLDI 2007.
We give a semantics to a polymorphic effect analysis that tracks possibly-thrown exceptions and possible non-termination for a higher-order language. The semantics is defined using partial equivalence relations over a standard monadic, domain-theoretic model of the original language and establishes the correctness of both the analysis itself and of the contextual program transformations that it enables. This is not strictly about state, but is a further application of the relational methodology we've been using.
N Benton and U Zarfaty. Formalizing and Verifying Semantic Type Soundness of a Simple Compiler. PPDP 2007.
We describe a semantic type soundness result, formalized in the Coq proof assistant, for a compiler from a simple imperative language with heap-allocated data into an idealized assembly language. Types in the high-level language are interpreted as binary relations, built using both second-order quantification and a form of separation structure, over stores and code pointers in the low-level machine.
N Benton, A Kennedy, M Hofmann and L Beringer. Relational Semantics for Effect-Based Program Transformations with Dynamic Allocation. PPDP 2007.
We give a denotational semantics to a region-based effect system tracking reading, writing and allocation in a higher-order language with dynamically allocated integer references. Effects are interpreted in terms of the preservation of certain binary relations on the store, parameterized by region-indexed partial bijections on locations. The semantics validates a number of effect-dependent program equivalences and can thus serve as a foundation for effect-based compiler transformations.
N Benton and V Koutavas. A Mechanized Bisimulation for the Nu-Calculus. MSR-TR-2008-129.
We introduce a Sumii-Pierce-Koutavas-Wand-style bisimulation for Pitts and Stark's nu-calculus, a simply-typed lambda calculus with fresh name generation. This bisimulation coincides with contextual equivalence and provides a usable and elementary method for establishing all the subtle equivalences given by Stark. We also describe the formalization of both the metatheory and the examples in the Coq proof assistant.
N Benton and N Tabareau. Compiling Functional Types to Relational Specifications for Low Level Imperative Code. TLDI 2009.
We describe a semantic type soundness result, formalized in the Coq proof assistant, for a compiler from a simple functional language into an idealized assembly language. Types in the high-level language are interpreted as binary relations, built using both second-order quantification and separation, over stores and values in the low-level machine.
N Benton and C-K Hur. Biorthogonality, Step-Indexing and Compiler Correctness. ICFP 2009.
We define logical relations between the denotational semantics of a simply typed functional language with recursion and the operational behaviour of low-level programs in a variant SECD machine. The relations, which are defined using biorthogonality and step-indexing, capture what it means for a piece of low-level code to implement a mathematical, domain-theoretic function and are used to prove correctness of a simple compiler. The results have been formalized in the Coq proof assistant.
N Benton, L Beringer, M Hofmann and A Kennedy. Relational Semantics for Effect-Based Program Transformations: Higher-Order Store. PPDP 2009.
We give a denotational semantics to a type and effect system tracking reading and writing to global variables holding values that may include higher-order effectful functions. Refined types are modelled as partial equivalence relations over a recursively-defined domain interpreting the untyped language, with effect information interpreted in terms of the preservation of certain sets of binary relations on the store. The semantics validates a number of effect-dependent program equivalences and can thus serve as a foundation for effect-based compiler transformations. The definition of the semantics requires the solution of a mixed-variance equation which is not accessible to the hitherto known methods. We illustrate the difficulties with a number of small example equations one of which is still not known to have a solution.

As you can see, we've used almost everything in our attempts to get to grips with state, generativity and encapsulation. Operational, concrete denotational, abstract denotational, and axiomatic semantics. High-level, medium-level and low-level languages. Manual and machine-assisted proof. Predicates, relations and parametric relations. There's an enormous amount still to do, but it's all starting to hang together in a most encouraging fashion...

...mathematics is always concerned with extensions rather than intensions. - Whitehead and Russell, Principia Mathematica