How to use Dafny to prove type safety
There are many ways even well tested programs can go wrong (see my previous blog post on how Dafny helps). Static typing often solves a class of these programming issues entirely by preventing unintended usage of a value.
This blog post takes a deep dive… take a deep breath… on how to write terms of a programming language, and both a type-checker and an evaluator on such terms, such that the following soundness property holds:
- [Progress] If a type checker accepts a term, then the evaluator will not get stuck on that term.
- [Preservation] If a term has a type T, then the evaluator will also return a term of type T
We will illustrate this using the infrastructure of the following Blockly workspace. If you build a full term and pass it to “Type check”, on success, the block is surrounded with green, otherwise with red. If surrounded with green, then attaching the term to “Evaluate” and clicking on “Evaluate” will always do something, except if the term is a final value. “Evaluate” is surrounded in red if its term does not type check.
Play with the evaluator and the type checker
By the way, have you seen that Pred(False)
does not type check?
Someone made the remark that it’s counter-intuitive, because they thought that Pred
would
stand up for Predicate
and so it should type check. But Pred
stands for Predecessor
,
and that’s why it applies only to numbers. The type-checker’s role is also to solve such ambiguities before the execution.
Examples
Feel free to click on the examples below to load them in the Blockly workspace above.
Writing a type checker in Dafny
Writing a type-checker that guarantees that evaluation of a term won’t get stuck is not an easy task as we must dive into maths, but fortunately, Dafny makes it easier.
The term language
First, let’s define the term language used in the Blockly interface above. Note that the logic of the Blockly workspace above uses that exact code written on this page. Yeah, Dafny compiles to JavaScript too!
The type language
Let’s also add the two types our expressions can have.
The type checker
We can now write a type checker for the terms above. In our case, a type checker will take a term, and return a type if the term has that type. We will not dive into error reporting in this blog post.
First, because a term may or may not have a type, we want an Option<A>
type like this:
Now, we can define a function that computes the type of a term, if it has one. For example, in a conditional term, the condition has to be a boolean, while we only require the “then” and “else” part to have the same, defined type. In general, computing types is a task linear in the size of the code, whereas evaluating the code could have any complexity. This is why type checking is an efficient way of preventing obvious mistakes.
A well-typed term is one for which a type exists.
The evaluator and the progress check
At first, we can define the notion of evaluating a term. We can evaluate a term using small-step semantics, meaning we only replace a term or a subterm by another one. Not being stuck means that we will always be able to find a term to “replace” or to “compute”, it’s a bit of a synonym here.
There are terms where no replacement is possible: value terms. Here is what we want them to look like: either booleans, zero, positive integers, or negative integers.
Now, we can write our one-step evaluation method. As a requirement, we add that the term must be well-typed and nonfinal.
The interesting points to note about the function above, in a language like Dafny where every pattern must be exhaustive, are the following:
- Every call consists either of
OneStepEvaluate
on one sub-argument, or a transformation that reduces the size of the tree. So something is always happening here. - All the cases are covered, Dafny does not complain!
- For example, when encountering the case
IsZero(e)
, ife
is a final value, it must be eitherPred
orSucc
. It cannot beTrue
orFalse
as it’s well-typed and the previous pattern precludesZero
. - Similarly, if the condition of an if term is a final value, because it’s well-typed, Dafny knows it’s either
True
orFalse
.
- For example, when encountering the case
That concludes the progress part of soundness checking: whenever a term type-checks, there is always an applicable small step evaluation rule unless it’s a final value.
The preservation check
Soundness has another aspect, preservation, as stated in the intro. It says that, when evaluating a well-typed term, the evaluator will not get stuck and the result will have the same type as the original term. Dafny can also prove it for our language, out of the box. Well done, that means our language and evaluator make sense together!
Conclusion
All the code above powers this page, which is why I can guarantee you that you won’t be able to find a term that the type checker accepts and that won’t result in a final value. Of course, in a real programming language term, you might add some infinite loops, but the soundness property above is not about termination, it’s about constant progress, which you also want in embedded systems to ensure they never need reboot.
Now that you know what a type checker is and how to implement one in Dafny, perhaps you will feel much better prepared to model and experiment on your new programming language, like recently the Cedar team did?
This is the end of the blog post. I hope you enjoyed it!
Bonus: more advanced modeling
If you are looking for some advanced concepts, feel free to continue reading! Beware, math ahead!
Sometimes, modeling the evaluator and the type-checker as functions is not enough. One wants to model them as relations, and determine some properties about these relations, such as the order of evaluation being irrelevant for the final result.
In the rest of this blog post, largely inspired by the book “Types and Programming Languages”, Chapter 8, written by Benjamin Pierce, I will illustrate one element of the proof: the one that inductive and constructive versions of the set of terms are equivalent. Having equivalence enables obtaining other results out of the scope of this blog post, including that the order of evaluation does not matter.
With the help of this trick, it becomes possible to prove similar equivalences for different inductive and constructive definitions of:
- The set of
(Expr, Expr)
of small-step evaluations - The set of
(Expr, Type)
of type checking
but I leave these as an exercise for the interested reader.
In Types and Programming Languages, chapter 3.2, we discover that there are two other mathematical definitions of the “set of all terms”. The first one in definition 3.2.1 states that the set of terms is the smallest set \(𝒯\) such that:
- \(\{\texttt{true}, \texttt{false}, 0\} \subseteq 𝒯\);
- if \(t_1 \in 𝒯\), then \(\{\texttt{succ}\;t_1, \texttt{pred}\;t_1, \texttt{is_zero}\;t_1\} \subseteq 𝒯\);
- if \(t_1 \in 𝒯\), \(\;\; t_2 \in 𝒯\) and \(t_3 \in 𝒯\), then \(\texttt{if}\;t_1\;\texttt{then}\;t_2\;\texttt{else}\;t_3 \in 𝒯\).
Note that these terms omit Double
and Add
above. This means we cannot state that this set is the same as set t: Term | true
as one would like to write, but let’s continue.
We can write the inductive definition above in Dafny too:
The second definition for the set of all terms in section 3.2.3 is done constructively. We first define a set \(S_i\) for each natural number \(i\), as follows
\(S_0 = ∅\);
\(\begin{aligned}S_{i+1} = && && \{\texttt{true}, \texttt{false}, 0\} \\ && \bigcup && \{\texttt{succ}\;t_1, \texttt{pred}\;t_1, \texttt{is_zero}\;t_1 \mid t_1 \in S_i \} \\ && \bigcup && \{\texttt{if}\;t_1\;\texttt{then}\;t_2\;\texttt{else}\;t_3\mid t_1 \in S_i,\; t_2 \in S_i,\; t_3 \in S_i\}\end{aligned}\).
This we can enter in Dafny too:
But now, we are left with the existential question: are these two sets the same?
We rush in Dafny and write a lemma ensuring AllTermsConstructively == AllTermsInductively
by invoking the lemma InductiveAxioms()
, but… Dafny can’t prove it.
If you think deeply about it, how do you know that the two are the same? It seems obvious but why? It seems straightforward to prove that AllTermsInductively <= AllTermsConstructively
because by definition, AllTermsConstructively
obeys induction rules. But is it the smallest of such sets? But what if there was an element of AllTermsConstructively
that is not in AllTermsInductively
? It could actually happen if, instead of a datatype, we only had a trait, and some external user could implement new terms yet unknown to us.
Here is Benjamin Pierce’s proof sketch, then translated and verified in Dafny.
- First, prove
AllTermsInductively <= AllTermsConstructively
by showing thatAllTermsConstructively
satisfies the predicateInductionCriteria
. - Second, for any set
someset
satisfying the induction criteria, for everyi
, we prove by induction that every set of termsS(i)
is insidesomeset
. AllTermsConstructively
being the union of all theseS(i)
, it is also contained in any set satisfying the induction criteria, includingAllTermsInductively
which is the smallest one, soAllTermsConstructively <= AllTermsInductively
- From 1. and 3. we obtain
AllTermsConstructively == AllTermsInductively
.
Let’s prove it in Dafny!
0. Intermediate sets are cumulative
First, we want to show that, for every i <= j
, we have S(i) <= S(j)
(set inclusion). We do this in two steps: first, we show this cumulative effect between two consecutive sets, and then
between any two sets.
We use the annotation {:vcs_split_on_every_assert}
which makes Dafny verify each assertion independently, which, in this example, helps the verifier. Yes, helping the verifier is something we must occasionally do in Dafny.
To further control the situation, we use the annotation {:induction false}
to ensure Dafny does not try to prove induction hypotheses by itself, which gives us control over the proof. Otherwise, Dafny can both automate the proof a lot (which is great!) and sometimes time out because automation is stuck (which is less great!). I left assertions in the code so that not only Dafny, but you too can understand the proof.
1. Smallest inductive set contained in constructive set
After proving that intermediate sets form an increasing sequence, we want to prove that the smallest inductive set is contained in the constructive set. Because the smallest inductive set is the intersection of all sets that satisfies the induction criteria, it suffices to prove that the constructive set satisfies the induction criteria.
Note that I use the annotation {:rlimit 4000}
which is only a way for Dafny to say that every assertion batch should verify using less than 4 million resource units (unit provided by the underlying solver), which reduces the chances of proof variability during development.
2. Intermediate constructive sets are included in every set that satisfy the induction criteria
Now we want to prove that every S(i)
is included in every set that satisfies the induction criteria. That way, their union, the constructive set, will also be included in any set that satisfies the induction criteria. The proof works by remarking that every element of S(i)
is built from elements of S(i-1)
, so if these elements are in the set satisfying the induction criteria, so is the element by induction.
I intentionally detailed the proof so that you can understand it, but if you run it yourself, you might see that you can remove a lot of the proof and Dafny will still figure it out.
3. The constructive set is included in the smallest inductive set that satisfies the induction criteria
We can deduce from the previous result that the constructive definition of all terms is also included in any set of term that satisfies the induction criteria. From this we can deduce automatically that the constructive definition of all terms is included in the smallest inductive set satisfying the induction criteria.
4. Conclusion with the equality
Because we have <=
and >=
between these two sets, we can now prove equality.
Bonus Conclusion
We were able to put together two definitions for infinite sets, and prove that these sets were equivalent. As stated in the introduction, having multiple definitions of a single infinite set makes it possible to pick the definition adequate to the job to prove other results. For example,
- If a term is in the constructive set, then it cannot be constructed with
Add
for example, because it would need to be in aS(i)
and none of theS(i)
defineAdd
. This can be illustrated in Dafny with the following lemma:
which Dafny can verify pretty easily. However, if you put AllTermsInductively
instead of AllTermsConstructively
, Dafny would have a hard time figuring out.
- If
x
is in the inductive set, thenSucc(x)
is in the inductive set as well. Dafny can figure it out by itself using theAllTermsInductively
definition, but won’t be able to do it withAllTermsConstructively
without a rigorous proof.
This could be useful for a rewriter or an optimizer to ensure the elements it writes are in the same set.
Everything said, everything above can be a bit overweight for regular Dafny users. In practice, you’re better off writing the inductive predicate explicitly as a function rather than an infinite set with a predicate, so that you get both inductive and constructive axioms that enable you to prove something similar to the two results above.
This above example illustrates what Dafny does best: it automates all the hard work under the hood
so that you can focus on what is most interesting to you, and even better, it ensures you don’t need to define {:axiom}
yourself in this case.
I hope you give Dafny a try and looking forward to your interesting questions!