There is a vast number of tokens trading on public blockchains like Ethereum and BNB Chain. CoinMarketCap tracks more than 22,000 different cryptocurrencies and tokens. This figure includes stablecoins, wrapped tokens, tokens that grant voting rights in governance protocols, tokens that represent a share of a liquidity pool, tokens that incentivize users to hold their assets by imposing fees on each transfer, and more.
Each of these tokens is implemented as a smart contract. A smart contract is a program that is published on a blockchain. These contracts range from simple to extremely complex, and can hold anywhere from a fraction of a cent of value to assets worth many billions of dollars. While strong cryptographic algorithms assure the integrity of the data stored on the blockchain, security vulnerabilities in a smart contract’s code can have devastating consequences. Code security is of paramount importance, especially in light of the fact that the code of a smart contract cannot be changed after deployment. Additionally, smart contracts are open source, meaning each line of code is publicly available once the contract is deployed. Patching security vulnerabilities over time is not an option; it’s imperative to get security right the first time around.
Reliably detecting such vulnerabilities before deployment is key to blockchain security. In addition to manual code reviews by expert security engineers, CertiK implements formal verification to mathematically prove certain features of the ERC-20 token contracts that we audit are correct. Want to learn more? You’re in the right place. If you're just getting started, check out our introduction to formal verification first. Otherwise, read on.
Following the successful launch of Ethereum, the widespread proliferation of tokens led to varying implementations and coding practices. Recognizing the problems this could cause, the Ethereum Foundation agreed on the ERC-20 standard that eases interoperability between tokens and defines the interfaces and expected behaviors of token contracts. Since September 2022, CertiK has routinely included the formal verification of ERC-20 contracts during our audits. As of the time of writing, we had formally verified ERC-20 properties for 416 contracts in 164 projects (some projects have more than one contract implementing the ERC-20 interface). Before we go into details, let’s take a quick look at some examples of ERC-20 expected behaviors.
At the core of an ERC-20 token contract are the public API functions transfer() and
transferFrom(), which implement the secure transfer of tokens from one address to another.
transfer() function lets an account holder transfer a certain amount of their own tokens to another address. The code snippet below shows the essential idea. The smart contract maintains a mapping that stores the number of tokens held by each address.
transfer() function updates the smart contract’s mapping by deducting the amount of tokens from the account that triggered the transfer and crediting it to the recipient's account. Once the update is complete, it emits an event to notify observers that the transfer has occurred.
ERC-20 compliant tokens also allow their holders to approve others to transfer certain amounts from their accounts. The other parties may be either people or smart contracts (e.g. decentralized exchanges). Let's take a closer look at a simplified version of
transferFrom(). The token contract maintains a two-dimensional mapping in a state variable
_allowances, storing the amount of each holder’s tokens each spender has at her command. For example, an entry of
_allowances[Alice][Bob] = 1000 would allow Bob to spend 1000 tokens from Alice’s account. Obviously, any transfer on behalf of someone else needs to update the involved
allowances_ entry. In our example, that is done by subtracting the number of transferred tokens from the spender's allowance.
The ERC-20 function
approve() grants an allowance of
amount tokens to the address in
Finally, the ERC-20 standard defines functions that allow us to query the state of the smart contract. The optional functions
decimals() can be used to retrieve the token’s full name, its symbol, and the number of decimal digits used in its fixed-point representation. Also, any ERC-20 token is expected to maintain a record of the number of circulating tokens. That total supply can be queried by a public API function
totalSupply(). Finally, the public functions
allowance() can be used to retrieve the balance of tokens held by an account and the allowance that one account has to spend another account's tokens.
Before we dive into how formal verification for ERC-20 token contracts works, let's pause for a moment and take a look at some of the problems CertiK has encountered during audits of ERC-20 contracts.
Sometimes, the difference between a secure ERC-20 token and a vulnerable one is a single line of code. Recall our implementation of
transferFrom() from above. Imagine that Bob attempts to transfer an amount of tokens from Alice's account using
transferFrom(). If Bob is a bad actor, he could try to transfer an amount that exceeds the allowance he has for Alice's address. (If Alice did not grant Bob any allowance, his allowance would simply be zero.) Such an attempt would fail in our example, as the Solidity compiler (as of version 0.8.0) automatically introduces over- and underflow checks when evaluating arithmetic expressions. In our example, Bob's attempt to spend more tokens than he has been granted by Alice's allowance will result in an underflow when evaluating the expression in Line 7. Such an arithmetic error triggers an exception that causes the entire transaction to revert with no effect. As a result, Bob's attempt to exceed his allowance will fail.
If a pre-0.8.0 version of the Solidity compiler were used, however, the above code would behave very differently; over- and under-flows would not be detected by the compiler generated code and would silently result in arithmetic wrap-around. In those earlier versions of the Solidity compiler, Bob's attempt to transfer an amount that is larger than his available allowance would cause the subtraction in Line 7 to underflow. If
_allowances[from][spender] is strictly less than
amount, the result would be negative. Since
_allowances maps to values of type
uint256, i.e. 256-bit unsigned integers, the bits representing the negative number from the subtraction (in two’s complement form) would erroneously be interpreted as representing a huge positive number. Not only would Bob succeed in spending Alice's tokens beyond what she approved, but the underflow in
transferFrom() would silently increase Bob's allowance to an unimaginably large number.
Another pattern that we sometimes encounter is an incorrect attempt to refactor ERC-20's transfer logic. You might have observed that
transferFrom() are rather similar. Isn't it good practice in software engineering to factor out common functionality?
While this is true in general (and correctly done in libraries like OpenZeppelin), it can go terribly wrong if not done carefully.
The code updates the balances, emits the appropriate event and performs some plausibility checks to prevent transfers to invalid addresses. Using this internal helper function, the public API functions
transferFrom() can be simplified. However, in the effort to simplify the code, the allowance update has been omitted from
transferFrom(). This allows anyone with a positive allowance to exceed the limit imposed by the tokens’ owner over her tokens by repeatedly executing transfers from her account. In that way, a malicious actor can drain the account entirely!
At CertiK, we have written property templates that precisely describe the expected behaviors of ERC-20 token contracts. These templates are generic: our tools analyze the implementation details of each smart contract and adapt the templates accordingly. In this way, we obtain a set of 38 contract-specific formal properties that, taken together, cover the basic expected behavior of an ERC-20 token contract. This includes aspects such as:
transferFrom() decreases the allowance of the sender over the spender's account by the transferred amount.
As usual when reasoning about programs, precision is key. In order to prove that a smart contract behaves according to a property, the property needs to be stated in precise, unambiguous, and machine interpretable language. Moreover, the reasoning needs to take into account the environment that a smart contract operates in. This is why CertiK incorporates a mathematical model that describes the details of transactions within a blockchain. In that model, a blockchain is understood as a database that stores smart contracts together with their data and where the data can be modified by transactions. A transaction executes the token's underlying bytecode, which in turn may modify its associated data. As in the Ethereum blockchain, transactions are grouped into blocks, where each block summarizes the changes triggered by all its transactions.
Given that the data stored in a smart contract evolves over time, it is natural to reason about the sequences of transactions that may be executed. Is there a sequence of transactions that could drive our smart contract into an insecure state?
That’s why the formalism that we use to express properties reasons about relevant state changes (the blue circles in the image below) along potentially infinite sequences of transactions:
Leveraging this insight, we can prove that under reasonable assumptions about the current state of the blockchain, a transaction that executes a public or external function of an ERC-20 token contract changes the blockchain state only as expected.
For example, let's assume that Alice and Bob have 𝒙 and 𝒚 tokens, respectively, and that Alice sends a transaction to the underlying ERC-20 token contract requesting to transfer 𝒛 tokens from her account to Bob's. Assuming that Alice's balance suffices (i.e. if 𝒛 ⩽ 𝒙), one would naturally expect that the call to
transfer() will succeed and update the blockchain state such that Alice's balance is decreased and Bob's is increased by 𝒛.
Generalizing this idea, we can write a mathematical formula that symbolic model checking algorithms can reason about. In essence, we state that whenever our client’s
transfer() function is executed, we expect that it eventually finishes and updates the balances accordingly.
This intuition can be written formally in linear temporal logic (LTL), which includes the operators
always (denoted □) and
eventually (denoted ◊). Informally, always means that a property always holds during transactions. Eventually means that a property will hold at some point in the future. Always can be used to specify properties such as "no user's balance ever becomes negative" and eventually can be used to specify properties such as "if I ask to withdraw money from a contract, I eventually receive my money."
To fully understand their meaning, imagine that you could observe the blockchain's transactions flying by and attach labels to them. Each label could summarize information about its underlying transaction such as "function has been started" or "tokens are sent from Alice to Bob". LTL formulas express properties about such label sequences. For example, assuming we labeled transactions with a predicate
alice_transfers_to_bob(y). The LTL formula
◊alice_transfers_to_bob(50) then formalizes the expectation that we will encounter a transaction now or in the future where Alice transfers 50 tokens to Bob.
Generalizing this idea towards verifying ERC-20 token behavior, let's label transactions to describe which of the contract's public API functions was
finished. Then the following formula expresses that for all future transactions, if a function
transfer() is invoked with a sufficient balance, it is expected to eventually finish in a state where the balances are updated according to the original request:
CertiK audit reports contain formal verification results for ERC-20 tokens by default. We search for contracts that implement the ERC-20 interface and translate each of those contracts into a mathematical model. That model contains a formal description of all possible executions of the smart contract on the blockchain as well as side-conditions that describe the blockchain environment (e.g. transactions, exceptions, external function calls, rollbacks, etc.).
This model is the last missing piece for formal verification. Once it is constructed, we combine it with our predefined properties that capture the expected behaviors of ERC-20 tokens and feed it into a symbolic model checker that attempts to prove or disprove the property.
To achieve this, the symbolic model checker infers facts about the smart contract by symbolic means; that is, variables are treated as such, branch conditions impose logical constraints, formulas get simplified and rewritten, and ultimately the property is decomposed into a sequence of queries. Sometimes, even proofs by induction are performed fully automatically. The core of symbolic model checking is a set of algorithms that break the verification task down into smaller proof obligations, which can then be processed by automated theorem provers such as Z3, CVC5, and Yices2. The results from the automated theorem provers are combined to either arrive at a proof that the smart contract's model satisfies the formula, or to provide enough information to generate a counter-example that disproves the property.
This mode of operation has profound consequences for the results. Unlike other quality controls like testing or fuzzing, the analyses of a symbolic model checker use mathematical arguments to prove or disprove that a smart contract meets its expected behaviors. In that sense, the results are universal. Once a property has been proven correct, you can rely on the mathematical arguments that show that all executions of the smart contract, no matter which inputs were used or what state the blockchain was in, satisfy the property.
We have shown how CertiK uses formal verification to mathematically prove that the basic functionality of ERC-20 token contracts that we audit is correct. You learned how we combine our expertise in smart contract auditing and experience with formal verification to provide the highest level of security. In the next post in this series, we formally verify a widely-used ERC-20 implementation and real contracts derived from it.