Advanced Formal Verification of ZKP: Verifying a ZK Instruction

A zero-knowledge virtual machine (zkVM) creates a short proof object which serves as evidence that a particular program can be run on a certain input and terminate successfully. In Web3 this enables high throughput, because the L1 nodes only need to verify short proof objects that a smart contract will transition from an input state to an output state, while the actual execution of the contract code can be done off-chain.

A zkVM prover works by executing the program, generating a trace of every step of execution, and then translating the data in the trace into a set of tables of numbers (this is called “arithmetization”). The numbers must satisfy a set of constraints (the “circuit”) which contains equations between particular table cells, fixed constants, database lookup constraints between the tables, and polynomial equations (“gates”) which are satisfied by each pair of adjacent table rows. The on-chain verifier can check that there exists some table satisfying all the constraints, but does not see the individual numbers in the table.

Each VM instruction’s execution is subject to multiple such constraints, here we refer to this set of constraints for a VM instruction as its “ZK instruction”. Below is an example of zkWasm load instruction constraints, written as code in Rust.

unnamed - 2024-04-29T005339.970

Formal verification of a ZK instruction is done by doing formal reasoning about this code, which we first translate into a formal language.

unnamed - 2024-04-29T005449.515

In zero-knowledge virtual machines, the accuracy of each constraint is critical. A single error in any constraint can potentially allow an attacker to submit a misleading proof, suggesting that the tables represent a valid execution of the smart contract when they do not. This vulnerability is more pronounced in zkVMs than in traditional non-zero-knowledge chains like Ethereum. Ethereum operates with multiple nodes, each running different implementations of the Ethereum Virtual Machine (EVM), making it less likely for all nodes to share the same error. Conversely, a zkVM typically operates under a single VM implementation, which can increase the risk of system-wide vulnerabilities.

Moreover, the opaque nature of zkVM transactions exacerbates these vulnerabilities. Unlike traditional non-ZK chains, where transaction computations are openly submitted and recorded on the blockchain, zkVMs do not store these details on-chain. This lack of transparency makes it difficult to ascertain the specifics of an attack or even to identify that an attack has occurred.

zkVM systems demand the utmost rigor. Unfortunately, zkVM circuits are challenging to get right.

Why So Hard?

Virtual machines (VMs) represent some of the most complex components within Web3 systems. The power of smart contracts, which are central to the functionality of Web3 systems, relies on these sophisticated and flexible VMs. They are designed to support a vast array of instructions and states necessary for general-purpose computation and storage. For instance, the geth implementation of Ethereum's Virtual Machine (EVM) required over 7,500 lines of Go code to function effectively. The zero-knowledge (ZK) circuits that enforce the execution rules of these instructions are equally complex. In the case of zkWasm, the implementation of these ZK circuits involved more than 6,000 lines of Rust code.

zkWasm Circuit Architecture zkWasm Circuits Architecture

Compared to application-specific ZK circuits used in fixed application ZK-systems (e.g., private payments), the scale of zkVM circuits is significantly larger. The set of constraints in zkVM circuits can be orders of magnitude greater, and the arithmetization tables may feature hundreds of columns and millions of numbers.

unnamed - 2024-04-29T005717.709

What Does This Mean?

The goal is to validate the zkWasm XOR instruction. Specifically, we aim to verify that executing this instruction will always result in a new valid ZK VM state. This means that each row in the zkWasm execution table should correspond to a valid Wasm VM execution trace; every subsequent row should emerge from executing a VM instruction. Below is the formal statement of the XOR instruction’s correctness.

unnamed - 2024-04-29T005824.947

Here state_rel i st testifies that state st is a valid ZK VM state for the smart contract at step i. As you might have guessed, it is not trivial at all to prove state_rel (i+1) ….

What Does This Take?

While the XOR instruction’s computational semantics are simple—calculate the bitwise XOR of two integers and return the resulting integer—the complexity arises in the operational details. The former needs to be fetched from a stack memory, and the latter needs to be stored back into the same stack. Additionally, the instruction execution step should appear in an execution flow of the entire smart contract, with the correct order and timing of execution.

Screenshot 2024-04-29 at 1.01.16 AM

So the effect of the XOR instruction should be to pop two numbers off the value stack, push back their XOR result, and increment the program counter to the next instruction in the smart contract.

Screenshot 2024-04-29 at 1.01.47 AM

As one can tell, the overall correctness property is actually quite similar to what we would verify for a conventional bytecode VM (such as the EVM interpreter in an Ethereum L1 node). It relies on a definition of the machine state (stack memory and execution flow here) at a high level of abstraction and definitions about the intended high-level behavior of each instruction (arithmetic logic here).

However, as we will soon see, the approach to verifying correctness in a zkVM differs significantly due to the unique aspects of Zero-Knowledge Proofs (ZKP) and the structure of zkVMs. Specifically, the verification of even a single instruction such as XOR hinges on the accuracy of multiple tables within zkWASM. These include a range table that bounds the size of numbers, a bit table for intermediate bitwise computations, and the execution table where each row encapsulates a constant-size VM state, analogous to the data held in the registers and latches of a physical CPU. The memory table and jump table represent the dynamically growing VM state, encompassing memory, value stack, and call stack.

Step 1: Stack Memory

Similar to a conventional VM, one wants to make sure that the XOR instruction’s two integer arguments are correctly read from the stack, and the resulting XOR value is accurately written back to the stack. The stack's formalization in a zkVM also appears familiar, incorporating elements like global and heap storage, which are not utilized by the XOR instruction.

unnamed - 2024-04-29T010232.433

zkVMs use a sophisticated scheme to represent dynamic data as ZK provers do not natively support data structures like stacks or arrays. Instead, the memory table contains a row for each value ever pushed to the stack, with columns indicating at which times that entry should be considered valid. The contents of these tables are, of course, under the control of any attacker, so there must also be some kind of constraint that ensures that the table entries really correspond to actual push instructions in the contract execution. This is done by carefully counting the number of pushes in the program trace. When verifying each instruction, we need to ensure that the counts are always correct. There is also a set of lemmas relating the constraints generated by the individual instructions to the table lookup and time range checks that implement the stack operations. At the top level, the memory operation count constraint is defined as follows.

unnamed - 2024-04-29T010309.257

Step 2: Arithmetic Operations

While executing a bitwise XOR operation may appear straightforward—given that our physical computer CPUs can perform this in a single operation—the process is more complex in a zkVM.

For zkVM, this is actually not straightforward. The only arithmetic operations natively supported by a zk-prover are addition and multiplication. To do bitwise operations, the VM uses a more involved scheme where a fixed table specifies the results on bytes, and another table is used as a "scratch space" to show on multiple table rows how a 64-bit number can be decomposed into 8 bytes and then the results re-composed again.

In a zero-knowledge VM, the only arithmetic operations natively supported are addition and multiplication. To facilitate bitwise operations, the VM employs a detailed scheme involving fixed tables. These tables not only specify the results for bytes but also serve as "scratch space" to demonstrate over multiple table rows how a 64-bit number can be decomposed into 8 bytes, with the results subsequently reassembled.

unnamed - 2024-04-29T010355.299 Snippets of zkWasm Bit Table Specs

What would typically be a simple XOR operation in a conventional programming language thus requires a comprehensive set of lemmas to verify the correctness of these auxiliary tables.

unnamed - 2024-04-29T010445.040

Step 3: Execution Flow

Similar to conventional VM, one wants to make sure the program counter is updated correctly. For sequential instructions such as XOR, it's necessary to increment the program counter by one after each execution step.

As zkWasm is designed to execute Wasm code, it is also imperative to ensure that Wasm memory invariants are consistently maintained throughout the execution.

While conventional programming languages have native support for data types such as booleans, 8-bit integers, 64-integers, etc., in a ZK circuit the variables always range over integers modulo a large prime number (≈ 2254). Since a VM typically operates on 64-bit quantities, the circuit programmer needs to use a system of constraints to force them to have the right size, and the verification engineer needs to keep track of those size invariants throughout the development. Reasoning about execution flow and the execution table involves all other auxiliary tables, thus all data sizes will need to be checked correctly.

unnamed - 2024-04-29T010522.941

For zkVMs, verifying control flow requires a similar approach to memory verification, involving a comprehensive set of lemmas. Each call and return instruction must interact with the control-flow stack, which, like the value stack, is implemented using a table-based scheme. Although the XOR instruction does not modify this stack, its verification still necessitates tracking to ensure the control flow operations count remains accurate.

unnamed - 2024-04-29T010555.929

Verifying the Instruction

Let’s put things together and verify the end-to-end correctness statement of the zkWasm XOR instruction. This verification occurs within an interactive proof environment where each formal construct and logical deduction step is mechanically verified at the highest level of rigor.

unnamed

As we can see, formally verifying a zkVM circuit is feasible, but it is a substantial undertaking that requires understanding and tracking a sophisticated set of invariants. This reflects the complexity of the underlying software: each of the lemmas involved in the verification is something that the original circuit programmer needed to get right. Given the high stakes, employing a formal verification system to machine-check these lemmas—rather than relying solely on human diligence—is essential to ensure the reliability and security of the system.