Formal Verification of ERC-20 Tokens

The ERC-20 Token Standard

Following the successful launch of Ethereum, the widespread proliferation of tokens led to varying implementations and coding practices. Recognizing the problems this could cause, the Ethereum Foundation agreed on the ERC-20 standard that eases interoperability between tokens and defines the interfaces and expected behaviors of token contracts. Since September 2022, CertiK has routinely included the formal verification of ERC-20 contracts during our audits. As of the time of writing, we had formally verified ERC-20 properties for 416 contracts in 164 projects (some projects have more than one contract implementing the ERC-20 interface). Before we go into details, let’s take a quick look at some examples of ERC-20 expected behaviors.

Token Transfers

At the core of an ERC-20 token contract are the public API functions transfer() and transferFrom(), which implement the secure transfer of tokens from one address to another.

The transfer() function lets an account holder transfer a certain amount of their own tokens to another address. The code snippet below shows the essential idea. The smart contract maintains a mapping that stores the number of tokens held by each address.

Screenshot 2023-02-08 at 10.23.06 AM

Basic Transfers

The transfer() function updates the smart contract’s mapping by deducting the amount of tokens from the account that triggered the transfer and crediting it to the recipient's account. Once the update is complete, it emits an event to notify observers that the transfer has occurred.

Approved Transfers

ERC-20 compliant tokens also allow their holders to approve others to transfer certain amounts from their accounts. The other parties may be either people or smart contracts (e.g. decentralized exchanges). Let's take a closer look at a simplified version of transferFrom(). The token contract maintains a two-dimensional mapping in a state variable _allowances, storing the amount of each holder’s tokens each spender has at her command. For example, an entry of _allowances[Alice][Bob] = 1000 would allow Bob to spend 1000 tokens from Alice’s account. Obviously, any transfer on behalf of someone else needs to update the involved allowances_ entry. In our example, that is done by subtracting the number of transferred tokens from the spender's allowance.

sec approved transfers simple erc20 transferfrom

Approving Allowances

The ERC-20 function approve() grants an allowance of amount tokens to the address in spender.

Screenshot 2023-02-08 at 10.25.10 AM

Querying Token Attributes

Finally, the ERC-20 standard defines functions that allow us to query the state of the smart contract. The optional functions name(), symbol(), and decimals() can be used to retrieve the token’s full name, its symbol, and the number of decimal digits used in its fixed-point representation. Also, any ERC-20 token is expected to maintain a record of the number of circulating tokens. That total supply can be queried by a public API function totalSupply(). Finally, the public functions balanceOf() and allowance() can be used to retrieve the balance of tokens held by an account and the allowance that one account has to spend another account's tokens.

Common Pitfalls When Writing ERC-20 Token Contracts

Before we dive into how formal verification for ERC-20 token contracts works, let's pause for a moment and take a look at some of the problems CertiK has encountered during audits of ERC-20 contracts.

Overflows and Underflows

Sometimes, the difference between a secure ERC-20 token and a vulnerable one is a single line of code. Recall our implementation of transferFrom() from above. Imagine that Bob attempts to transfer an amount of tokens from Alice's account using transferFrom(). If Bob is a bad actor, he could try to transfer an amount that exceeds the allowance he has for Alice's address. (If Alice did not grant Bob any allowance, his allowance would simply be zero.) Such an attempt would fail in our example, as the Solidity compiler (as of version 0.8.0) automatically introduces over- and underflow checks when evaluating arithmetic expressions. In our example, Bob's attempt to spend more tokens than he has been granted by Alice's allowance will result in an underflow when evaluating the expression in Line 7. Such an arithmetic error triggers an exception that causes the entire transaction to revert with no effect. As a result, Bob's attempt to exceed his allowance will fail.

sec overflows and underflows transferfrom

If a pre-0.8.0 version of the Solidity compiler were used, however, the above code would behave very differently; over- and under-flows would not be detected by the compiler generated code and would silently result in arithmetic wrap-around. In those earlier versions of the Solidity compiler, Bob's attempt to transfer an amount that is larger than his available allowance would cause the subtraction in Line 7 to underflow. If _allowances[from][spender] is strictly less than amount, the result would be negative. Since _allowances maps to values of type uint256, i.e. 256-bit unsigned integers, the bits representing the negative number from the subtraction (in two’s complement form) would erroneously be interpreted as representing a huge positive number. Not only would Bob succeed in spending Alice's tokens beyond what she approved, but the underflow in transferFrom() would silently increase Bob's allowance to an unimaginably large number.

Refactorings Gone Awry

Another pattern that we sometimes encounter is an incorrect attempt to refactor ERC-20's transfer logic. You might have observed that transfer() and transferFrom() are rather similar. Isn't it good practice in software engineering to factor out common functionality?

While this is true in general (and correctly done in libraries like OpenZeppelin), it can go terribly wrong if not done carefully.

sec refactorings gone awry transfer and transferfrom

The code updates the balances, emits the appropriate event and performs some plausibility checks to prevent transfers to invalid addresses. Using this internal helper function, the public API functions transfer() and transferFrom() can be simplified. However, in the effort to simplify the code, the allowance update has been omitted from transferFrom(). This allows anyone with a positive allowance to exceed the limit imposed by the tokens’ owner over her tokens by repeatedly executing transfers from her account. In that way, a malicious actor can drain the account entirely!

Formal Verification of ERC-20 Token Contracts

At CertiK, we have written property templates that precisely describe the expected behaviors of ERC-20 token contracts. These templates are generic: our tools analyze the implementation details of each smart contract and adapt the templates accordingly. In this way, we obtain a set of 38 contract-specific formal properties that, taken together, cover the basic expected behavior of an ERC-20 token contract. This includes aspects such as:

A successful transfer correctly deducts the specified amount from the sender's balance and credits it to the recipient's balance.
A successful invocation of transferFrom() decreases the allowance of the sender over the spender's account by the transferred amount.
A transfer must fail if the requested amount of tokens exceeds the number of tokens available to the sender.
A transfer must fail if it causes the balance of the recipient to overflow.
Transfers to or from the zero address must always be rejected.

Formalizing Behaviors of Smart Contracts

As usual when reasoning about programs, precision is key. In order to prove that a smart contract behaves according to a property, the property needs to be stated in precise, unambiguous, and machine interpretable language. Moreover, the reasoning needs to take into account the environment that a smart contract operates in. This is why CertiK incorporates a mathematical model that describes the details of transactions within a blockchain. In that model, a blockchain is understood as a database that stores smart contracts together with their data and where the data can be modified by transactions. A transaction executes the token's underlying bytecode, which in turn may modify its associated data. As in the Ethereum blockchain, transactions are grouped into blocks, where each block summarizes the changes triggered by all its transactions.

Given that the data stored in a smart contract evolves over time, it is natural to reason about the sequences of transactions that may be executed. Is there a sequence of transactions that could drive our smart contract into an insecure state?

That’s why the formalism that we use to express properties reasons about relevant state changes (the blue circles in the image below) along potentially infinite sequences of transactions:

sec formalizing behaviors of smart contracts blockchain drawing

Leveraging this insight, we can prove that under reasonable assumptions about the current state of the blockchain, a transaction that executes a public or external function of an ERC-20 token contract changes the blockchain state only as expected.

For example, let's assume that Alice and Bob have 𝒙 and 𝒚 tokens, respectively, and that Alice sends a transaction to the underlying ERC-20 token contract requesting to transfer 𝒛 tokens from her account to Bob's. Assuming that Alice's balance suffices (i.e. if 𝒛 ⩽ 𝒙), one would naturally expect that the call to transfer() will succeed and update the blockchain state such that Alice's balance is decreased and Bob's is increased by 𝒛.

Generalizing this idea, we can write a mathematical formula that symbolic model checking algorithms can reason about. In essence, we state that whenever our client’s transfer() function is executed, we expect that it eventually finishes and updates the balances accordingly.

This intuition can be written formally in linear temporal logic (LTL), which includes the operators always (denoted □) and eventually (denoted ◊). Informally, always means that a property always holds during transactions. Eventually means that a property will hold at some point in the future. Always can be used to specify properties such as "no user's balance ever becomes negative" and eventually can be used to specify properties such as "if I ask to withdraw money from a contract, I eventually receive my money."

To fully understand their meaning, imagine that you could observe the blockchain's transactions flying by and attach labels to them. Each label could summarize information about its underlying transaction such as "function has been started" or "tokens are sent from Alice to Bob". LTL formulas express properties about such label sequences. For example, assuming we labeled transactions with a predicate alice_transfers_to_bob(y). The LTL formula ◊alice_transfers_to_bob(50) then formalizes the expectation that we will encounter a transaction now or in the future where Alice transfers 50 tokens to Bob.

Generalizing this idea towards verifying ERC-20 token behavior, let's label transactions to describe which of the contract's public API functions was started or finished. Then the following formula expresses that for all future transactions, if a function transfer() is invoked with a sufficient balance, it is expected to eventually finish in a state where the balances are updated according to the original request:

Screenshot 2023-02-08 at 10.38.13 AM

Symbolic Model Checking

CertiK audit reports contain formal verification results for ERC-20 tokens by default. We search for contracts that implement the ERC-20 interface and translate each of those contracts into a mathematical model. That model contains a formal description of all possible executions of the smart contract on the blockchain as well as side-conditions that describe the blockchain environment (e.g. transactions, exceptions, external function calls, rollbacks, etc.).

Screenshot 2023-02-08 at 10.39.44 AM

This model is the last missing piece for formal verification. Once it is constructed, we combine it with our predefined properties that capture the expected behaviors of ERC-20 tokens and feed it into a symbolic model checker that attempts to prove or disprove the property.

To achieve this, the symbolic model checker infers facts about the smart contract by symbolic means; that is, variables are treated as such, branch conditions impose logical constraints, formulas get simplified and rewritten, and ultimately the property is decomposed into a sequence of queries. Sometimes, even proofs by induction are performed fully automatically. The core of symbolic model checking is a set of algorithms that break the verification task down into smaller proof obligations, which can then be processed by automated theorem provers such as Z3, CVC5, and Yices2. The results from the automated theorem provers are combined to either arrive at a proof that the smart contract's model satisfies the formula, or to provide enough information to generate a counter-example that disproves the property.

This mode of operation has profound consequences for the results. Unlike other quality controls like testing or fuzzing, the analyses of a symbolic model checker use mathematical arguments to prove or disprove that a smart contract meets its expected behaviors. In that sense, the results are universal. Once a property has been proven correct, you can rely on the mathematical arguments that show that all executions of the smart contract, no matter which inputs were used or what state the blockchain was in, satisfy the property.

Conclusion

We have shown how CertiK uses formal verification to mathematically prove that the basic functionality of ERC-20 token contracts that we audit is correct. You learned how we combine our expertise in smart contract auditing and experience with formal verification to provide the highest level of security. In the next post in this series, we formally verify a widely-used ERC-20 implementation and real contracts derived from it.