Abstract
Formal verification of a compiler is a long-standing problem in computer science and,
although recent years have seen substantial achievements in the area, most of the proposed
solutions do not scale very well with the complexity of modern software development
environments. In this thesis, I present a formal semantic model of the popular
C programming language described in the ANSI/ISO/IEC Standard 9899:1990, in the
form of a mapping of C programs to computable functions expressed in a suitable variant
of lambda calculus. The specification is formulated in a highly readable functional
style and is accompanied by a complete Haskell implementation of the compiler, covering
all aspects of the translation from a parse tree of a C program down to the actual
sequence of executable machine instructions, resolving issues of separate compilation,
allowing for optimising program transformations and providing rigorous guarantees of
the implementation's conformance to a formal definition of the language.
In order to achieve these goals, I revisit the challenge of compiler verification
from its very philosophical foundations. Beginning with the basic epistemic notions of
knowledge, correctness and proof, I show that a fully rigorous solution of the problem
requires a constructive formulation of the correctness criteria in terms of the translation
process itself, in contrast with the more popular extensional approaches to compiler
verification, in which correctness is generally defined as commutativity of the system
with respect to a formal semantics of the source and target languages, effectively formalising
various aspects of the compiler independently of each other and separately
from its eventual implementation. I argue that a satisfactory judgement of correctness
should always constitute a direct formal description of the job performed by the software
being judged, instead of an axiomatic definition of some abstract property such
as commutativity of the translation system, since the later approach fails to establish a
crucial causal connection between a judgement of correctness and a knowledge of it.
The primary contribution of this thesis is the new notion of linear correctness,
which strives to provide a constructive formulation of a compiler's validity criteria by
deriving its judgement directly from a formal description of the language translation
process itself. The approach relies crucially on a denotational semantic model of the
source and target languages, in which the domains of program meanings are unified
with the actual intermediate program representation of the underlying compiler implementation.
By defining the concepts of a programming language, compiler and compiler
correctness in category theoretic terms, I show that every linearly correct compiler
is also valid in the more traditional sense of the word. Further, by presenting a complete
verified translation of the standard C programming language within this framework, I
demonstrate that linear correctness is a highly effective approach to the problem of
compiler verification and that it scales particularly well with the complexity of modern
software development environments.