Advanced Formal Verification of ZKP: How ZK Memory Was Proven

While we have shown what verifying one of the instructions of zkWasm is like, giving a taste of the project, readers who are already familiar with formal verification may want to know what is distinctive about a zkVM compared to verifying smaller ZK systems, or compared to verifying other types of bytecode VMs. In this post, we will discuss some interesting technical points that arise along our journey of verifying the zkWasm memory subsystem. The memory is the most distinctive part of a zkVM, so handling this well is a crucial point relevant for all other zkVM verification projects as well.

Formal Verification: VM vs zkVM

The property that we ultimately verify about zkWasm is similar to the correctness theorem for an ordinary bytecode interpreter (VM), such as the EVM interpreter used by Ethereum nodes. The property says that each step of execution of the interpreters corresponds to a valid step according to the operational semantics of the language. As the diagram shows, if the current state of the data structures in the bytecode interpreter is SL, and that state represents a state SH in the high-level specification of the Wasm machine, and the interpreter steps to a state SL', then there must exist a high-level state SH' which represents SL', and the Wasm specification must say that SH can step to SH'.

For zkVM, a similar property says that each new row in zkWasm’s execution table corresponds to a valid step according to the operational semantics of the language. As the diagram below shows, if the current state of the data structures in the row of the execution table is SR, and that state represents a state SH in the high-level specification of the Wasm machine, and the next row in the table represents a state SR', then there must exist a high-level state SH' which represents SR', and the Wasm specification must say that SH can step to SH'.

In both the VM and zkVM cases, the specification of high-level states and Wasm steps are the same, so much previous experience with verifying programming language interpreters or compilers carries over. What is specific to zkVM verification has to do with what kind of data structures make up the low-level state of the system.

First, as we described in our previous blog post, we need to account for the fact that zk-provers natively work with integers modulo a large prime number, while the Wasm specification and normal interpreters deal with 32- or 64-bit integers. This affects most parts of the implementation of the zkVM, and requires corresponding work in the verification. However, this is a "local" issue: each line of code gets more complicated because it needs to deal with arithmetic, while the overall structure of the code and the proof doesn't change.

The other big difference is how dynamically sized data structures are handled. In a bytecode interpreter written in a conventional language, the memory, value stack, and call stack are implemented as mutable data structures, and similarly the formal Wasm specification represents the memory as data type with get/set methods. For example, the Geth EVM interpreter has a Memory datatype implemented as a byte array representing physical memory, with Set32 and GetPtr methods to write and read. To implement a memory store instruction, Geth calls Set32 to modify the physical memory.

func opMstore(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
	// pop value of the stack
	mStart, val := scope.Stack.pop(), scope.Stack.pop()
	scope.Memory.Set32(mStart.Uint64(), &val)
	return nil, nil
}

In a corresponding correctness proof for such an interpreter, we would prove that, after doing an assignment to both the concrete memory (in the interpreter) and the high-level memory (in the specification), the high and low state match each other; which is easy enough.

This, however, will prove to be much more involved for a zkVM.

The zkWasm Memory Tables and Memory Abstraction Layer

In a zkVM, the execution table has columns for fixed-size data (similar to registers in a CPU), but it cannot handle dynamically-sized data structures, and those are instead implemented using lookups into auxiliary tables. The zkWasm execution table has a column EID which takes values 1,2,3..., and two auxiliary tables MTable and JTable represent memory content and call stacks.

Let’s look at the execution of the sample withdraw program below.

int balance, amount;
void main () {
  balance = 100;
 	amount = 10;
  balance -= amount; // withdraw
}

The ETable content and structure is quite straightforward. There are 6 execution steps (EID 1 to 6), each has a row listing its opcode and the address and value if the instruction is a memory read or write.

Execution Table

Each row in the MTable contains an address, a value, a "start EID", and an "end EID". The start EID is the EID of the step in the trace that wrote that value to the address, the end EID is the step that will next write to that address. (It also contains a count, which we will discuss more later.) The circuit for the Wasm memory read instruction uses a lookup constraint to enforce that there exists a suitable entry in the table, such that the EID of the read instruction is in the start–end range. (The JTable similarly has a set of entries corresponding to call stack entries, and each entry is tagged with the EID of the Call instruction that created it.)

Memory Table

This memory system is quite different from a regular VM interpreter: instead of a mutable memory which is updated step by step, the MTable contains the history of all memory accesses in the entire trace. To make the work of the programmer easier, zkWasm provides an abstraction layer in the form of two convenience functions called

alloc_memory_table_lookup_write_cell

and

alloc_memory_table_lookup_read_cell

with the following arguments:

arguments

For example, the code in zkWasm that implements the memory store instruction contains one use of the write alloc function:

let memory_table_lookup_heap_write1 = allocator
    .alloc_memory_table_lookup_write_cell_with_value(
	"store write res1",
	constraint_builder,
	eid,
	move |____| constant_from!(LocationType::Heap as u64),
	move |meta| load_block_index.expr(meta),   // address
	move |____| constant_from!(0),             // is 32-bit
	move |____| constant_from!(1),             // (always) enabled
    );
let store_value_in_heap1 = memory_table_lookup_heap_write1.value_cell;

The alloc function will take care of the lookup constraint between the tables as well as the arithmetic constraints relating the current eid to the MTable entry. So the programmer can think of the tables as representing an ordinary memory, and that after this line the value store_value_in_heap1 has been assigned to the address load_block_index.

Similarly, memory read instructions are implemented using the read alloc function. In the example trace above, there is one read constraint for each load instruction and one write constraint for each store instruction, each of which is satisfied by one of the entries in the MTable.

mtable_lookup

The structure of a verification development should correspond to the abstractions used in the software being verified, so the proofs can follow the same logic as the code. For zkWasm, that means that we want to verify the MTable circuits and the "alloc read/write cell" functions as one module, with an interface that reflects the intention that it acts like a mutable memory. Given such an interface the verification of the per-instruction circuits can be done similarly to a conventional interpreter, with the extra ZK complexity encapsulated in the memory module.

In the verification we make the intuition that the table represents a mutable structure explicit, by writing a function memory_at type which explicitly scans through the MTable and builds the corresponding key-value dictionary. (The variable type ranges over heap, value stack, and globals, which are three different kinds of Wasm memory.) We then prove that the constraints generated by the alloc functions are equivalent to mutating the corresponding dictionary using set and get functions. In simplified pseudo-code, we prove:

For each eid, if the constraints

(memory_table_lookup_read_cell eid type offset value)

hold, then

get (memory_at eid type) offset = Some value

And if the constraints

(memory_table_lookup_write_cell eid type offset value)

hold then

(memory_at (eid+1) type) = set (memory_at eid type) offset value

Per-instruction verification can then be done in terms of setting and getting dictionary items, similar to non-ZK bytecode interpreters.

The zkWasm Counting Scheme

However, the simplified description above did not explain the full contents of the MTable and JTable. In a zkVM these tables are under the control of an attacker, so with the tables described so far, they could simply insert an extra row to cause the memory load instruction to return an arbitrary value.

For example, in our withdrawal program case, a hacker can inject a fake memory write of $110 to balance, right before the withdrawal action. This can be done easily with injecting a row in MTable and modifying the corresponding values in existing cells of MTable and ETable, and will results in the withdrawal being “free”, as the balance remains at $100 afterwards.

Execution Table of A Hack Memory Table of A Hack

To ensure that the MTable and JTable only contain entries generated by actual return and call instructions, zkWasm uses a scheme to count the number of entries. Specifically, the MTable has a column which keeps a running count of the number of entries, while the execution table has a table counting the intended number of memory write operations of each instruction. One equality constraint ensures that the counts are equal. The intuition is simple: each memory write is counted, and there are exactly that many entries in the MTable, so therefore there can not be any extra entries created by an attacker.

Execution Table2 Memory Table2

That statement is too hand-wavy to be accepted by a mechanized proof assistant, so in the course of the verification, we need to make it more precise. As a first step, we need to correct the statement of the writing lemma above. We define a function mops_at eid type which counts the number of MTable entries with a given eid and type (most instructions will create 0 or 1 entries at one eid).The full statement of the theorem has one more precondition, saying that there were no spurious MTable entry:

If the constraints

(memory_table_lookup_write_cell eid type offset value)

hold and so do

(mops_at eid type) = 1

then (memory_at(eid+1) type) = set (memory_at eid type) offset value

We can already see that this forces us to be more precise than the intuition above. It is not enough to prove that the total number of entries is equal to the total number of memory writes in the trace (which would follow easily from the equality constraint). In order to prove the correctness of the instructions we need to know that each instruction corresponds to the right number of MTable entries. For example, we need to rule out that the attacker omitted the entry for one instruction in the trace and created a malicious extra entry for a different instruction.

To prove this we use an argument to bound the number of entries for a given instruction from above and from below, which works in three steps. First, for each type of instruction in the trace we can define how many memory entries it should create, based on the type of instruction. We call the intended number of writes from the ith element to the end of the trace instructions_mops i, and we call the actual number of entries in the MTable, from the ith instruction to the end, cum_mops (eid i). By analyzing the lookup-constraints for each instruction, we prove that it creates at least as many entries as intended, from which it follows that each segment [i ... numRows] of the trace created at least as many entries as intended:

Lemma cum_mops_bound': forall n i,
  0 ≤ i ->
  i + Z.of_nat n = etable_numRow ->
  MTable.cum_mops (etable_values eid_cell i) (max_eid+1) ≥ instructions_mops' i n.

Second, it follows trivially that if the table also has as most as many entries as intended, then it has exactly the right number:

Lemma cum_mops_equal' : forall n i,
    0 ≤ i ->
    i + Z.of_nat n = etable_numRow ->
    MTable.cum_mops (etable_values eid_cell i) (max_eid+1) ≤ instructions_mops' i n ->
    MTable.cum_mops (etable_values eid_cell i) (max_eid+1)  = instructions_mops' i n.

Now we are ready for step Three. Our correctness theorem states that for every n, cum_mops and instructions_mops agree for the part of the tables from n to the end:

Lemma cum_mops_equal : forall n,
    0 <= Z.of_nat n < etable_numRow ->
  MTable.cum_mops (etable_values eid_cell (Z.of_nat n)) (max_eid+1) = instructions_mops (Z.of_nat n)

The proof is by induction on n. For the very first row in the table, we have the zkWasm equality constraint saying that the total number of entries in MTable is correct, i.e. cum_mops 0 = instructions_mops 0. For a later row, the induction hypothesis tells us that

cum_mops n = instructions_mops n

and we want to prove

cum_mops (n+1) = instructions_mops (n+1)

Note that cum_mops n = mop_at n + cum_mops (n+1)

and

instructions_mops n = instruction_mops n + instructions_mops (n+1)

So we have

mops_at n + cum_mops (n+1) = instruction_mops n + instructions_mops (n+1)

We previously proved that each instruction creates at least as many entries as it should, i.e., that

mops_at n ≥ instruction_mops n.

So it follows that

cum_mops (n+1) ≤ instructions_mops (n+1)

which is exactly what we need to apply the second lemma.

(The proof for the JTable uses the same sequence of lemmas to prove that each call instruction creates exactly one JTable entry, so this proof technique applies generally. However, there we need some additional work to prove the correctness of the return instructions. The eid of the return is not the same as the eid of a call instruction that created the call frame, so we also need an additional invariant stating that the eid labels are monotonically increasing in the trace.)

Spelling out arguments in this level of detail is typical for formal verification, which is why it often takes longer to verify a particular piece of code than to write it in the first place. Is it worth the effort? In this case it was, because while doing the proofs we uncovered a bug in a previous version of the JTable counting scheme. We already described the bug in detail in a previous post—to summarize, the old code counted both call and return instructions, and an attacker could make room for a fake JTable entry by adding additional return instructions to the trace. The incorrect counting scheme still satisfied the intuitive property that every call/return instruction is counted, but when we tried to elaborate that intuition into a more precise theorem statement it became clear that it was problematic.

Make the Proof Modular

From the discussion above, we can see that there is a kind of circular dependency between the proofs about per-instruction circuits and the proofs about the count column of the execution table. To prove that the instruction circuits are correct, we need to reason about memory writes in them; which requires knowing the number of MTable entries at that eid; which requires proving the memory write count in the execution table is correct; which requires knowing that each instruction makes a minimum number of memory writes. Make the Proof Modular

There is one additional consideration, which is that the zkWasm project is quite large, so the verification needs to be structured so the work can be split up between several verification engineers. Thus, the proof of the counting scheme requires some care to untangle. For example, for the LocalGet instruction the two theorems look like this.

Theorem opcode_mops_correct_local_get : forall i,
  0 <= i ->
  etable_values eid_cell i > 0 ->
  opcode_mops_correct LocalGet i.

Theorem LocalGetOp_correct : forall i st y xs,
  0 <= i ->
  etable_values enabled_cell i = 1 ->
  mops_at_correct i ->
  etable_values (ops_cell LocalGet) i = 1 ->
  state_rel i st ->
  wasm_stack st = xs ->
  (etable_values offset_cell i) > 1 ->
  nth_error xs (Z.to_nat (etable_values offset_cell i - 1)) = Some y ->
  state_rel (i+1) (update_stack (incr_iid st) (y :: xs)).

The first theorem states

opcode_mops_correct LocalGet i

which, expanding out the definitions, means that the instruction creates at least 1 MTable entry in the row i (the number 1 is specified in the zkWasm opcode configuration for LocalGet).

The second theorem is the full correctness theorem for the instruction, and it takes

mops_at_correct i

as a hypothesis, which means that the instruction creates exactly 1 MTable entry. A verification engineer can prove the two theorems independently, which are then used together with the proof about the execution table to conclude the correctness of the full system. Notably, all the per-instruction proofs can be done at the level of read/write constraints, without needing to know anything about the implementation of the MTable. The project therefore divides into three parts that can be handled independently.

memory access correctness

Summary

On a line-by-line basis, verifying the circuits of a zkVM is not so different from verifying domain specific ZK applications, because it requires the same reasoning about arithmetic constraints. And, on a high level, the project requires many of the same methods that are used for formal verification of programming language interpreters and compilers. The main differences concern dynamically-sized states. However, by carefully structuring the verification to match the layers of abstraction used in the implementation, the effect of those differences can be minimized, so that each instruction can be verified independently with respect to a get-set interface similar to a conventional interpreter.