CertiK Logo
CertiK Logo
Products
Company
incident-response
Back to all stories
Blogs
Case Studies
Formal Verification, the Move Language, and the Move Prover
11/18/2022

This is the third in a series of posts looking at the Move programming language. By now, we've covered an introduction to Move, and offered solutions to common issues we've encountered while auditing projects written in Move.

In this follow up piece, we take a deep dive into the formal verification of Move programs. Move has been designed from the ground up with verification in mind. We start by revisiting non-standard language constructs that actually help improve the security of Move programs. Subsequently, we walk through some examples of deductive formal verification in Move and highlight important lessons we learned.

Formal Verification, the Move Language, and the Move Prover

A Primer on the Move Language

Basics of Move

The Move language features many constructs that simplify the task of writing (provably) secure blockchain applications. We covered the basics of the Move language in a previous blog and the advantages it offers for writing secure smart contracts. We saw that, unlike most other smart contract programming languages, Move was designed from the ground up with verification in mind. In this follow up, we take a deep dive into formal verification of Move programs. We first revisit some of Move’s non-standard language constructs that help improve security, then walk through some examples of deductive formal verification in Move and highlight important lessons we learned while developing the examples.

The Move repository on GitHub contains a lightweight tutorial that covers basic setup and a sequence of examples for a simple token smart contract.

Restricted Set of Primitive Data Types

One distinctive feature of Move is its limited set of primitive data types. Unlike most general purpose programming languages, Move does not support signed integers, floating point numbers (IEEE-754), strings, or maps. Move also does not provide direct access to pointers, which makes formal verification significantly easier. This leaves the following list of Move primitive data types:

  • Unsigned 8, 64, and 128 bit integers (denoted by u8, u64, and u128).
  • Booleans (denoted by bool)
  • A datatype address to represent addresses on the blockchain. Addresses in Move do not allow address arithmetic, cannot be modified, and cannot be converted to (or from) integers. Another restriction is that one can only use addresses that are obtained at runtime to access resources, not to access modules. These restrictions are highly desirable for efficient formal verification!
  • Reference types that can either be mutable or immutable. References are closely linked to Move’s ownership model, in that passing references does not transfer ownership.

User-Defined Types

Move is more conventional when it comes to user-defined data types. Data is mostly represented in structures (records) with typed fields as they are known from most programming languages:

struct Asset has key, store { type: u8, price: u64 }

For now, do not worry about the key and store attributes that Asset is declared with. These are called abilities and define how ownership and storage of values of type Asset must be handled; for example, a struct that doesn’t have the store ability must not be written into persistent blockchain storage. We will revisit abilities in greater detail in the section of Move’s ownership model.

Encapsulation, Abstract Data Types, and Friends

Move enforces a rather strict encapsulation of data within modules. In particular, you may only access the fields of a structure from functions that are defined within the same module. Also, structures can only be created or deconstructed in code that resides in the same module. In that sense, Move structure declarations are treated as abstract data types that hide their inner workings from code outside of “their” module. Functions within a module are private by default and thus can only be called from within that module. Any function can be declared public, making it accessible from external code. And there is a noteworthy third protection level that will become important later on: Modules can have friends. If we define a contract coin that represents some asset and a coin_manager that provides infrastructure, we can befriend the coin and the coin_manager:

module coin { friend CoinModule::coin_manager; public(friend) do_something_privileged(); [...]

Functions that are declared as public(friend) can only be invoked from the contract itself and from all of the contract’s friends. In that way, we can grant the coin_manager contract augmented privileges compared to other contracts.

Fine-grained modules and carefully chosen friend specifications are important in Move, where restricting the operations on locally defined structures serves more purposes than abstraction and encapsulation:

  1. Self-contained modules make specification more understandable and more convincing. That becomes important when ghost variables are used to track “specification state”. In small modules, it is much easier to convince oneself that the ghost variable is actually tracking what it is supposed to track.

  2. The encapsulation provided by modules and friend declarations has important security implications. To understand those, let’s take a closer look at Move’s ownership model.

Ownership in Move

The Move language enforces strict rules on accessing global memory and has a dedicated ownership model built in. These rules have been designed to simplify development of secure Move programs as well as to simplify automated formal verification of those programs.

Before we go into details, let’s quickly dive into a major difference in data representation between the Ethereum and the Diem blockchain:

Persistent User Data on the Ethereum Blockchain

Ethereum distinguishes between contract and user addresses; data (including the bytecode of smart contracts itself) can be stored at contract addresses, whereas no such data can be attached to user addresses. A common pattern in Ethereum smart contracts is to include a mapping as a state variable that keeps track of user data. On a simplified token contract, that might look like this:

contract SampleToken { mapping(address => uint256) private _balances; function transfer(address to, uint256 amount) { _balances[msg.sender] = _balances[msg.sender] - amount; _balances[to] = _balances[to] + amount; } }

The entire transfer is implemented by modifying the mapping that is persisted on the blockchain under the SampleToken contract’s deployment address. In particular, the recipient has no control over anything stored in that mapping under her address; it is merely an index into a map that is maintained by the smart contract.

In Ethereum, data about a smart contract’s users is held in mappings that reside in the storage space allocated to the owning smart contract. Usually, Ethereum smart contracts maintain a (possibly large) mapping from addresses to the corresponding data (such as the user’s token balance, deposit, etc.). Those mappings do not distinguish between addresses that belong to a user and those that belong to another smart contract. For example, that’s why both users and smart contracts can hold tokens.

User Data in Move/Diem

In Diem based blockchains, the structuring is fundamentally different. Data belonging to a user account is stored in a storage area associated to that account. The storage area of each user account can be thought of as a mapping from contract addresses to the corresponding user data. In that way, data belonging to different smart contracts can independently co-exist in the user’s storage area. This structuring makes an explicit mapping data type unnecessary in Move - each contract can implicitly store and manipulate user data via that user’s address and the type of the underlying resource in that user’s storage area. A practical advantage of this structuring is that users need to explicitly allow smart contracts to store information to their accounts. Unsolicited airdropping, i.e. “transferring” tokens to a user account without the user’s consent, is impossible.

Abilities

Abilities are a crucial ingredient for making Move programs secure. Essentially, they restrict which operations a program may perform on the underlying data type. Take, for example, the following snippet that defines a module coin with a very simple Coin structure:

module coin { struct Coin has key, store { amount: u64 } // Function to create coins. public fun pack_coin(amount: u64): Coin { Coin { amount } } }

The Coin structure has key and store abilities attached to its declaration. The storage areas in Move blockchains hold structures, only. Each such structure is identified by its name and the address of its defining contract. If we declare a structure with the key ability, it becomes hashable, i.e. its contract address and name can serve as a key into the internal lookup tables that constitute the storage areas of addresses. In short: Structures that have the key ability can be found in global storage.

Moreover, anything that should be stored in global storage must have the store ability. All other data items are ephemeral, as they are freed by the VM once they are no longer used (or the transaction terminates).

While the key and store abilities are mainly focused on Move/Diem’s data handling, the remaining two abilities are fundamental for managing digital assets, i.e. data that is attributed a certain value. Let’s look at the details!

Move’s Ownership Model

Similar to Rust, Move uses an ownership model for deciding when an allocated data structure can be deleted. It is theoretically rooted in linear logic, which reasons about resources that can be passed around. Now imagine an instance of the Coin data structure from our previous example. In Move, any such value can represent an underlying asset (i.e. value) without any further security precautions:

  • The Coin structure does not have the copy ability. Hence, instances of Coin cannot be copied. They cannot be passed into functions via a call-by-value and cannot be copied, otherwise. The code in function counterfeiter tries to make money by simply duplicating any coin it receives:
fun counterfeiter(coin: Coin): (Coin, Coin) { let duplicate = copy coin; (coin, duplicate) }

That code will be rejected by the compiler:

moveprover1

To be absolutely sure, one might ask: What if that code wasn’t implemented in the Move language, but as a corresponding piece of bytecode? Then the Move compiler was not involved at all and could not detect the counterfeiting attempt, right?

Luckily, the Move VM’s bytecode is typed and keeps track of abilities. So even if an attacker circumvented the Move compiler by directly injecting bytecode, the Move VM’s bytecode verifier that runs on the validators and analyzes any byte code before it gets executed would reject it. So in essence, code that violates ability constraints can never be executed on the blockchain!

  • The Coin structure also lacks the drop ability. That makes the Move compiler & virtual machine reject execution of any code that would delete the coin instance. See this as a safety precaution that avoids accidentally loosing your digital assets. The only place where such a structure may be deleted is within code of its declaring module. Here and only here can our Coin be destructured and deleted:
fun unpack_coin(coin: Coin): u64 { let Coin { amount } = coin; amount }

Note the subtle difference to other programming languages: The code in Line 2 goes beyond pattern matching semantics in that it makes the field amount of the Coin structure available in a local variable. It also destroys the coin instance, i.e. the underlying digital asset is gone by that point. Only the information that was stored in it, is preserved in a local variable. To merely read the amount in a given coin instance without destroying the latter, one must use a field selector such as coin.amount.

Finally, the access restrictions provided by Move modules complete the ownership story: Structures declared within a module can only be created (also called “packed”) within the code of that module. They cannot be dropped and the only way to delete them is to destructure them within their defining module. From outside the coin module, any such structure is an abstract data type whose inner structure, lifecycle, or contents remains locked.

Exception Handling (or the Lack Thereof)

Exceptions signal a failure that often arises from unexpected or invalid inputs or inconsistent state. They are thrown when the condition in an assert! statement evaluates to false, but they also arise on numeric over- and underflows, division by zero, and other system failures (e.g. when a resource does not exist). When an exception occurs, the current transaction aborts and rolls back all state changes it made. Move does not allow catching an exception and dealing with it (as Solidity does since version 0.6). This lack of exception handling facilitates formal verification, as it simplifies reasoning about control flow considerably. And even for humans, correctly tracking the locations where exceptions may be thrown and those where they potentially get caught (if they are caught at all), is a daunting task.

At this point, we have covered most language features that set Move apart from other commonly known smart contract programming languages; we also dived into some of the details that have specifically been built into Move to make it amenable to formal reasoning. So it is time to look into the actual topic of this blog post: How to formally prove or disprove properties of Move programs!

Formal Verification of Move Programs

Move comes bundled with the a dedicated tool for formal verification, the Move Prover. The language and the prover have been developed together and are tightly integrated.

The Move Prover And Its Specification Language

The Move programming language has a specification sub-language that allows programmers to state desired properties about Move programs. Its principles rest upon theoretical contributions from Floyd, Hoare, and Meyer starting in the 1960s. In particular, the Move Prover follows the design by contract paradigm. Internally, the Move Prover translates the Move program together with its specifications into a mathematical model, which is then formally verified using the deductive verifier Boogie. Boogie uses SMT solvers such as Z3 and CVC4 as decision procedures to discharge proof obligations. Once Boogie has completed formal verification, its results are mapped back to the level of the Move language and displayed to the programmer.

If you'd like to follow along using the examples in the Move Prover, the Move project's GitHub page has a detailed guide to installing the Move Prover. There is also a readme available with instructions for building the Move Prover from sources.

moveprover11 The Move Prover (J.E.Zhong, K. Cheang, S. Qadeer, W. Grieskamp, S. Blackshear, J. Park, Y. Zohar, C. Barrett, D. Dill. “The Move Prover”, CAV 2020)

The properties that need to be verified are written in the Move Specification Language. It has all the constructs necessary to reason about modules, data structures, functions, loops, and invariants. Formulas can be expressed in full first order logic. Exciting? Then let’s jump into the details!

Specifying Pre- and Post-Conditions on Move Functions

Augmenting a Move function with pre- and postconditions essentially forms a Hoare triple {P} F(x,y) {Q} stating that whenever function F is called in a state satisfying the predicates in P and terminates, its post-state satisfies the predicates in Q. (Note: This is known as a proof of partial correctness; it is partial, as we do not reason about non-terminating executions of F(x,y). If we also succeed in proving that F(x,y) terminates whenever it is started in a state satisfying P, we establish what is called total correctness.)

Let’s start with the very basics and take a look at the following Move function:

fun add(x : u64, y : u64) : u64 { x + y }

The function add is particularly simple, as it only refers to local parameters and does not access global resources). To capture its intended behavior, one might write the following specification:

spec add(x : u64, y : u64) : u64 { ensures result == x + y; }

Essentially, this says that whenever add is called, no matter what the values of x and y are (because we didn’t put any restrictions on them), it ensures that upon termination the returned result equals the sum of x and y. Lean back for a moment and think about whether that statement is actually true.

Perhaps surprisingly, the Move Prover verifies the specification without complaints. And while it obviously holds for “usual” inputs, you might have noticed that for large enough values of x and y, an overflow occurs when evaluating x + y. The reason for the Move Prover not detecting the overflow is that by default, it verifies partial correctness and regards the occurrence of a revert as a special case of non-termination. Essentially, the Move Prover disregards failing executions by default, and the exception caused by an overflow leads to a failing execution. We can control this behavior using the aborts_if_is_strict pragma, so let’s ask the Move Prover to take failing executions into account when verifying our specification:

spec add(x : u64, y : u64) : u64 { pragma aborts_if_is_strict = true; ensures result == x + y; }

Asking it again to verify the code above produces the expected counterexample, essentially highlighting the possibility that add reverts upon overflow:

moveprover12

The aborts_if_strict pragma can be set globally to ensure all specifications take reverts into account. We will work under this assumption from here on.

Let’s modify the specification to explicitly account for overflows. The Move Specification Language has a built-in primitive aborts_if to express conditions under which a function aborts. Specifying that a function aborts on overflows is nice; even nicer would be to rule out the possibility of such overflows in the first place. This can be achieved via requires specifications, which allow us to impose a precondition stating that function add may only be called with parameters x and y whose sum does not exceed 2^(64)-1:

spec add(x : u64, y : u64) : u64 { pragma aborts_if_is_strict = true; ensures result == x + y; }

Specification Subtleties

There are subtle differences between Move expression semantics and Move specification expression semantics. One such case is revealed in the example above. Whereas over- and underflows that occur during evaluation of arithmetic expressions such as x + y at runtime make the transaction abort, arithmetic expressions in specifications operate on mathematical integers. There, the finite-width integers u8, u64, and u128 are implicitly cast into a new data type num that has an unbounded integer domain, eliminating all over- and underflows. In the example above, it is this semantic difference that makes the aborts_if condition in Line 2 well defined.

Another crucial point is the restriction to “pure” statements within Move specifications. Specifications must be side-effect free apart from operations that may change specification (ghost) variables. Accordingly, one may use pure functions from the Move program within specifications; these are functions that do not modify global state and only use expressions that are well-formed in both Move and the Move specification language.

requires vs. aborts_if

You might wonder if it is necessary to specify both aborts_i x + y > MAX_U64 and requires x + y <= MAX_U64. And indeed, there is a subtle difference:

  • aborts_if states a condition that, when satisfied, will result in an abort and a state reversal. These specifications make all situations that may lead to an abort explicit.

  • requires statements impose a constraint on all known call-sites of the function; as such, they are not restricted to reasoning about aborting transactions. In our example, each call site of add within the scope of the Move project gives rise to a proof obligation where the Move Prover has to prove that all conditions from the callee’s require statements are met. Stated differently, our require specification requires each caller of add to ensure that the requirements (i.e., x + y does not overflow) are met before executing the call.

For internal calls, a require statement helps to avoid errors (and enforces some logical discipline), as the callers have to ensure that the requirements for the callee’s correct operation are met. That is not the case for aborts_if statements; they propagate up to the caller, but they do not require the caller to meet the callee’s requirements.

Specifying both essentially requires all known call sites (i.e., those that are within the scope of the project at compile time) to ensure the requirements are met while also specifying that the function may abort if invoked from external modules.

The Deductive Power of Move

The McCarthy 91 function is a popular example of a non-trivial recursive function. It takes an integer-valued input n and returns 91 if n is smaller than 100 and n - 10, otherwise.

This is a Move implementation and specification of that function:

public fun mc91(n: u64): u64 { if (n > 100) n - 10 else mc91(mc91(n + 11)) }
spec mc91 { pragma opaque; aborts_if false; ensures (n <= 100 ==> result == 91); ensures (n > 100 ==> result == n - 10); }

Spend a minute or two to wrap your head around the function and convince yourself that it’s non-trivial. You might be surprised that the Move Prover (actually, the Boogie verifier and the Z3 SMT solver that do the reasoning) can prove its correctness as written above, with no additional annotations. It suffices to state the expected result in two ensures specifications. The prover succeeds in proving correctness almost instantaneously:

moveprover15

Opaque Specifications

You may have noticed the pragma opaque annotation in the specification of the McCarthy 91 function. That option controls how function calls are handled within proofs:

By default (i.e., without pragma opaque), the Boogie Verifier reasons about the body of each function that is called. That is technically achieved by inlining the function call, replacing the call site with a copy of the callee’s function body. This is advantageous whenever the callee is not exhaustively specified; by analyzing its body, the Move Prover has complete knowledge about the callee’s inner workings.

The pragma opaque option changes that behavior. Instead of inlining the callee, the Move Prover tries to prove all requirements of the callee (as specified in its preconditions) at the call site. If that proof succeeds, it immediately assumes that all ensures statements hold. The implementation of the function is not taken into account, but replaced entirely by the axiomatization provided via its specification.

Opaque specifications are the foundation of modular reasoning. In our case, specifying pragma opaque is necessary because our McCarthy 91 implementation is recursive, making the default approach of using inlining inapplicable.

Always Double Check

As we will see later, double checking such results is crucial, as specification errors could lead to subtle inconsistencies that would render any proofs vacuous. Let’s try to introduce a bug:

public fun mc91_buggy(n: u64): u64 { if (n > 99) n - 10 else mc91_buggy(mc91_buggy(n + 11)) }

As we can see, the Move Prover finds a counterexample for n = 100, where our buggy version yields 90 instead of 91.

moveprover17

Specifications in Code Blocks

So far, we have used ensures and aborts_if to specify the desired result of functions. In all our previous examples, the Move Prover was able to deduce the validity of our specifications without any help. However, specifying the desired outcome of a function generally does not suffice for the prover to construct a proof. For more complex algorithms, we need to help in the proof derivation by providing additional specifications within the function’s body. We demonstrate this by specifying realistic properties of a standard bubble sort algorithm.

Bubble Sort

The basic bubble sort algorithm operates on arrays. For simplicity, we’ll assume that the arrays contain no duplicate elements; handling duplicate elements makes the specification and proof significantly more complex than we have space for here. The basic idea is to traverses the array elements from left to right, swapping adjacent entries whenever the value at index k is larger than the value at index k+1. If we arrive at the end and made changes during the traversal, we have to start over again. We terminate once no swaps occur during a traversal:

public fun sort(elems: &mut vector<u64>) { if (vector::length(elems) < 2) return; let i = 0; let swapped = false; while (i < vector::length(elems) - 1) { if (*vector::borrow(elems, i) > *vector::borrow(elems, i + 1)) { vector::swap(elems, i, i + 1); swapped = true; }; i = i + 1; }; if (swapped) sort(elems); }

Specifying Properties

We obviously require any sensible sorting algorithm to return the input, well, sorted. And we might also require that the result contains all the numbers from the input. To make sure that no new elements were added during the sort, we also require that the result be the same length as the input. That leads us to this specification:

spec sort { ensures forall i in 0..len(elems)-1: elems[i] <= elems[i+1]; ensures forall a in old(elems): contains(elems, a); ensures len(elems) == len(old(elems)); }

Let’s take a closer look at the specification.

The first ensures condition states that all elements are sorted in increasing order. There is one pitfall in Move that is worth mentioning.

The Move Specification language defines ranges as a built-in datatype that is mainly intended for quantification (as we did above). The syntax for such ranges is m..n, where m and n are numbers. In Move, the range m..n denotes the set of numbers {m, m+1, m+2, ..., n-1}; n is not included! So a range of the form m..n defines a left-closed interval [m, n). This semantics will be familiar to you if you have worked with ranges in Python, but otherwise may be surprising.

Equipped with that semantic insight, the universal quantification indeed ranges over all indices but the last one and specifies an increasing sequence. Our second condition states that the resulting vector retains all numbers from the input. And our last condition states that the length of the vector remains the same.

Now let’s ask the Move Prover to verify our specification of bubble sort. Given our previous experience, the results may be surprising at first: all of our specifications are reported to be violated!

moveprover20 Violation of the sortedness property.

moveprover21 The Move Prover claims that elements can be lost.

moveprover22 The result may be a different length than the input.

moveprover23 Transaction may abort due to an underflow on empty inputs.

You may ask: Is our implementation really that buggy? And luckily, the answer is no (it is actually correct with respect to this specification, as we will prove later). So why are all these errors reported?

Internally, the Move Prover—like most formal verification tools that follow the same approach—has to deduce validity for each of the postconditions in our specifications from what it can infer from the function’s implementation. Sadly, but unavoidably, the knowledge that can be inferred automatically by the Move Prover (or any other deductive verification tool) is generally limited to a degree that it will not be able to prove arbitrary specifications. The tool needs your help!

As a downside, deductive verification tools (including the Move Prover) cannot generally distinguish between specifications they are unable to prove correct because of missing lemmas about the implementation, and specifications that are truly violated. In both cases, these tools report a violation and generate a counterexample.

Let’s take a step back and for now focus only on proving that the elements in the resulting vector are stored in increasing order:

fun sort(elems: &mut vector<u64>) { if (vector::length(elems) < 2) return; let i = 0; let swapped = false; while (i < vector::length(elems) - 1) { if (*vector::borrow(elems, i) > *vector::borrow(elems, i + 1)) { vector::swap(elems, i, i + 1); swapped = true; }; i = i + 1; }; if (swapped) sort(elems); } spec sort { pragma opaque; ensures forall i in 0..len(elems)-1: elems[i] <= elems[i+1]; }

Intuitively, we can help the Move Prover by providing an idea for proving the property along with a couple of guiding specifications. Consider the Boolean flag swapped. It is initialized to false at the beginning of a bubble sort iteration and set to true when arriving at the first pair of elements where elems[i] is larger than elems[i+1]. Once set, the swapped flag remains true until the end of the iteration.

Loop Invariants To The Rescue

One key observation is that as long as swapped remains false, no elements required swapping; that is, they already are in increasing order. We can express that formally as a loop invariant:

!swapped ==> forall j in 0..i: elems[j] <= elems[j+1]

Note: The notation elems[k] is a shorthand for accessing the k-th vector element. It is only available within specifications and not part of the general Move language.

A subset of specifications can be used within code blocks. That includes:

  • assume statements, which ask the Move Prover to treat the respective condition as unconditionally true. In that sense, an assume provides a “local” axiom that refer to a particular location within the Move program’s control flow and can be used in other derivation steps.

  • assert statements, which must be proven correct at their source code location. If their validity cannot be shown, the Move Prover aborts.

  • invariant statements, which can be used to annotate loops. These are an essential ingredient in the Move Specification language to express more advanced properties of algorithms. An invariant specification requires the Move Prover to establish an inductive argument:

    • The induction base: show that the invariant condition holds when entering the loop; and

    • The induction step: assuming that the invariant condition holds when entering the loop body (using the induction hypothesis), show that the invariant condition still holds at the end of the loop iteration.

Equipped with that machinery, let’s ask the Move Prover to establish the invariant:

fun sort(elems: &mut vector<u64>) { if (vector::length(elems) < 2) return; let i = 0; let swapped = false; while ({ // Loop invariants are defined within the loop head. spec { invariant !swapped ==> (forall j in 0..i: elems[j] <= elems[j + 1]); }; i < vector::length(elems) - 1 }) { if (*vector::borrow(elems, i) > *vector::borrow(elems, i + 1)) { vector::swap(elems, i, i + 1); swapped = true; }; i = i + 1; }; if (swapped) sort(elems); }

We added the invariant wrapped in a specification within the head of the while loop. Such loop invariants are recognized by the Move Prover, which attempts to construct the underlying inductive argument. And indeed, the tool verifies our invariant:

moveprover27 First step: Proving the loop invariant.

Now let’s use the gained knowledge and try to prove our ensures specification (which was the goal we actually set out for):

spec sort { pragma opaque; ensures forall i in 0..len(elems)-1: elems[i] <= elems[i+1]; }

moveprover29 Our sorting specification has been formally verified.

Note that at this point, verification of the other specifications still fails: If we ask the Move Prover to establish that sort never aborts, it again reports a violation:

spec sort { pragma opaque; aborts_if false; }

moveprover31

In this instance, the Move Prover complains as it was unable to derive that the subtraction vector::length(elems) - 1 cannot cause an underflow. Again, we need to think about lemmas that might help in that derivation.

From looking at the source code, we can see that the loop can only be reached for vectors of length at least two. Further, we expect the sorting algorithm to not add or remove elements from the vector. So a natural candidate for a loop invariant is to state that the length of the vector remains unchanged during loop iterations. Using that, the Move Prover succeeds both in establishing that sort never aborts and that it maintains the length of the input vector:

spec sort { pragma opaque; aborts_if false; ensures len(elems) == len(old(elems)); }

moveprover33

Using the fact that the vector length remains unmodified during a loop iteration, the prover establishes the specification.

Finally, we still need to formally verify that all numbers from the input vector are preserved in the output. For that purpose, we proceed along the same lines and annotate the loop with the invariant that (in this case) simply corresponds to our post-condition. That hint suffices to complete that formal verification.

Taking all those steps together, we arrive at a formal verification for bubble sort:

module bubble_sort { use std::vector; // Bubble sort that operates in place. public fun sort(elems: &mut vector<u64>) { if (vector::length(elems) < 2) return; let i = 0; let swapped = false; while ({ // Loop invariants are defined within the loop head. spec { invariant !swapped ==> (forall j in 0..i: elems[j] <= elems[j + 1]); invariant len(elems) == len(old(elems)); invariant forall a in old(elems): contains(elems, a); }; i < vector::length(elems) - 1 }) { if (*vector::borrow(elems, i) > *vector::borrow(elems, i + 1)) { vector::swap(elems, i, i + 1); swapped = true; }; i = i + 1; }; if (swapped) sort(elems); } spec sort { pragma opaque; aborts_if false; ensures forall i in 0..len(elems)-1: elems[i] <= elems[i+1]; ensures forall a in old(elems): contains(elems, a); ensures len(elems) == len(old(elems)); } }

WARNING: Buggy Specifications

The specification we used for bubble sort is rather intuitive and easy to prove correct. But, as we said initially, we made the assumption that our array had no duplicate elements. And indeed, the specification we proved completely ignores the possibility of duplicate elements.

In particular, if the input contains duplicates, the specification does not require that each occurrence of each numbers appears in the result. If, for example, the input contained two elements with value 7, the output is only required to contain 7 once to fulfill the specification. That leaves one slot unspecified. It could be filled by any number without violating our specification.

Actually verifying correctness of bubble sort for arbitrary inputs requires a more advanced specification; in essence, one needs to prove that the input and output are related via a permutation. Specifying that might be the topic of another (future) blog post.

A reasonable question to ask is why we didn’t include an explicit precondition stating that the array could not have duplicates, instead of saying (in English) that we were making that assumption. Such a specification might look like this: requires forall i in 0..len(elems): forall j in 0..len(elems): i != j <==> elems[i] != elems[j];

The reason for not including this precondition is that the nested forall presents difficulties for the current version of the Move Prover, and that would have distracted from the more interesting parts of the example. Without it, you can run the example yourself and prove that the specification is correct for all inputs; it’s just not a strong enough specification to guarantee that arrays with duplicates get properly sorted.

Invariants on Modules

Module invariants state properties of the module that are maintained during the lifetime of the module. Imagine a crowdsale contract where no investor may deposit as much as the initiator (the "entrepreneur”) of the crowdsale. That can be specified using a global invariant. It quantifies over all all addresses (except the one that belongs to the entrepreneur) that store deposits and states that all investments must be strictly smaller than the one of the entrepreneur.

spec module { invariant forall addr: address where exists<Deposit>(addr) && addr != @Entrepreneur: global<Deposit>(addr).amount < global<Deposit>(@Entrepreneur).amount; }

Intuitively, if there was any way a transaction could result in an investor’s deposit outgrowing that of the entrepreneur, the Move Prover would detect that and complain. Consider the following simplified deposit function:

public fun deposit(account: address, amount: u64) acquires Deposit { let limit = borrow_global<Deposit>(@Entrepreneur).amount; let resource = borrow_global_mut<Deposit>(account); assert!(account == @Entrepreneur || resource.amount + amount < limit, ERROR_TOO_MUCH); resource.amount = resource.amount + amount; }

If we omit the assertion in Line 4 that aborts any transaction that would breach our invariant, the Move Prover complains:

moveprover37

With the assertion in place, the Move prover establishes the invariant without further specifications. The source code of the example is available here.

Data Structure Invariants

Similarly, data structures can be augmented with invariants. The Move Prover attempts to prove that any such invariant holds both when a struct is created and after each update. For example, one might want to express that the denominator of a (non-negative) fraction must be positive:

struct Rational has copy, drop { num: u64, denom: u64 }
spec Rational { invariant denom > 0; }

WARNING: Data structure invariants have rather limited expressiveness in Move because they must not refer to global state or ghost variables. In essence, the invariants may only relate the structure’s fields with each other, or with constants (as in the example above) but cannot refer to anything external.

Axioms and Free Functions

We haven’t yet discussed some additional features of the Move Specification Language; these include axioms and free (or uninterpreted) functions.

An uninterpreted function has a signature, but lacks an implementation; they exist only as specification functions inside Move specifications. They can be evaluated in specification expressions, but their result is undetermined. In essence, the Move Prover will search for any possible return value that might render one of its proof obligations invalid.

A common use of axioms is to state general properties of uninterpreted functions. In that way, axioms impose constraints on the Move Prover’s interpretations of those functions.

These features are generally used for more advanced specifications. One example of this is reasoning about cryptographic hash functions. A hash function’s implementation is generally designed to make it difficult to reason about; otherwise, collisions could be constructed easily via such reasoning. A broadly accepted (although unsound) technique to reason about code that uses hash functions is to replace any such function with an uninterpreted counterpart and impose an axiom that rules out hash collisions. In the Move Specification Language, that could be encoded as follows:

public fun my_hash(x : vector<u8>): vector<u8> { <actual cryptographic hash computation> } spec my_hash { pragma opaque; ensures [abstract] result == spec_my_hash(v); ensures [concrete] result == <result of the actual hash computation> }

Pitfalls and Restrictions

The Move Specification Language and the Move Prover are still new and important features are yet to be developed. In particular, the Move Prover has not been used to prove properties that require reasoning about sequences of transactions. Or, as the authors of the tool put it in its accompanying paper (c.f. “The Move Prover, CAV 2020”), “the specifications we have written so far are local in the sense that they deal with only a single execution of a single Move function.”

The Move Prover is restricted to local reasoning.

While we can specify invariants on a single function execution, the Move Prover has no built-in support to establish inductive invariants that hold throughout a module’s lifetime. For such a use-case, one would typically specify an initial state (e.g. right after deployment of the module) and a candidate invariant.

If we could prove that the state upon deployment of the contract satisfies the candidate invariant and that it is closed under any transaction sent to the contract, we would establish a contract invariant that is valid throughout the lifetime of the contract. However, Move has no built-in support for that kind of inductive reasoning.

WARNING: The lack of support for inductive reasoning across transactions impedes the specification of some important high-level security properties, such as:

  • Establishing that a coin contract correctly keeps track of the total amount of circulating coins.

  • Proving that relations between two liquidity pools are maintained.

  • Proving that while still running, the money held in a crowd sale smart contract is always at least equal to the total amount of deposits.

Ghost variables

The Move Specification Language introduces ghost variables. They are usually used to track state information within specifications, and are never compiled into actual byte code and have no runtime overhead. In other smart-contract deductive verification tools (e.g. SmartPulse, smart-ace), ghost variables are used to keep track of a global state across multiple transactions. As the Move Prover only reasons locally, ghost variables are less important and only used infrequently.

The Good, the Bad, and the Ugly

Writing correct specifications is hard. The most dangerous pitfall is inconsistencies: anything can be deduced from an inconsistent theory, and the Move Prover will happily do so.

spec module { axiom false; } public fun add(x: u64, y: u64): u64 { x + y } spec add { ensures result == x * y; }

Here, we introduce an axiom false, thereby introducing a (simple) inconsistency. As a result, the Move Prover infers an obviously incorrect ensures statement:

moveprover42

And while this is certainly not intended, it is not a bug! It is actually the only valid answer from a purely logical point of view. The Move Prover can, however, be used to detect some inconsistencies: If invoked with the --check-inconsistency command line option, it detects the issue:

moveprover43

Still, Move’s inconsistency check only finds some straightforward inconsistencies. More subtle ones often remain undetected. There remains a strong need to add plausibility checks by intentionally introducing bugs to convince oneself that they are captured by existing specifications.

These caveats conclude our tour of formal verification in Move. In the right hands, the Move Prover is a powerful tool that helps in writing correct code. However, the quality of the specifications is essential, and writing correct specifications can be even more difficult than writing correct code.