Auditing with ChatGPT: Complementary But Incomplete

On Dec. 23, 2022, ZKasino “hired” ChatGPT to identify potential security issues in their smart contracts. The tool raised several concerns that sounded valid on the surface.

While ChatGPT undeniably provides a valuable service to the Web3 security community, we found that there is quite a lot of room for improvement. ChatGPT missed a number of important vulnerabilities while giving false positives for good code.

We hope that our insight and recommendations can help ChatGPT become an even stronger tool for securing Web3 applications. The following sections present our findings on these two types of mistakes.

What Did ChatGPT Find?

Screenshot 2023-02-13 at 9.50.55 AM

What Did ChatGPT Miss?

ChatGPT mentioned several common security concerns that can be found in many smart contract implementations. However, it failed to identify certain serious security issues, including:

Project-specific logic vulnerabilities
Inaccurate math calculations and statistical models
Inconsistencies between implementation and design intention

Vulnerability #1: Project-Specific Logic

ChatGPT failed to identify a critical vulnerability, leaving ZKasino users vulnerable to an exploit where attackers could consistently win and drain funds from the Bankroll contract. Players can join the game by calling the Verifiable Randomness Function (VRF), and Chainlink's VRF will trigger the fulfillRandomWords() function with random numbers to complete the game. ZKasino’s code allowed for a refund of users' wagers that could be triggered if the calling of fulfillRandomWords() fails.

chatgpt1 Figure 1: A consistent winning attack strategy

During CertiK’s code review of the same smart contract code, a potentially harmful _transferPayout() invocation was discovered, The function was designed to transfer winning payouts to the player's account. An attacker can maliciously revert the _transferPayout() if they lose, causing the entire fulfillRandomWords() call to fail. This invokes a waiting period of 100 blocks and leads to the invocation of CoinFlip_Refund() for a refund, meaning the attacker would never lose money.

While the transfer failure issue was recognized by ChatGPT, the potential attack methods linked to the project design were not. Thus, the impact of the failure combined with the project's logic was not identified by ChatGPT. See ZKasino’s full audit report for a description of the specific attack flow.

Vulnerability Missed #2: Inaccurate Math Calculation and Statistical Models

Ensuring randomness and outcomes which meet reasonable expectations are of the utmost importance in any gaming project. To confirm this, the randomness of each game outcome was thoroughly evaluated during the audit process. Though ChatGPT acknowledges the significance of this matter, it does not detect any cases of unfairness. ChatGPT brings up the use of VRF and the potential for unfair outcomes if the VRF contract is compromised or manipulated:

“If the VRF contract is not secure or is manipulated, it could potentially lead to unfair outcomes for the game.”

However, this conclusion is limited and does not address the root causes of unfairness. We found a number of potential issues regarding randomness in the course of our audit.

Unfair Randomness Distribution

One medium-level issue found regarding randomness is the unfair random number usage issue in the VideoPoker game, where players have less chance to get certain cards.

Decimal Truncation

Another issue was discovered in the Dice game, which would have allowed players to choose special multipliers to maximize their expected returns.

Vulnerability #3: Inconsistencies Between Implementation and Intended Design

ChatGPT is often able to understand the implementation of a single function, while failing to grasp the design's underlying purpose. For example, it may understand the technical execution of a certain function, but not be able to place the purpose of this function in the broader context of the smart contract. To ensure that ChatGPT does not make mistakes in its coding, it needs to better understand smart contract code logic. As it currently stands, ChatGPT provides a surface level reading of the code. To take its auditing to the next level, it must be able to work backwards from a function to derive its initial logic: a significant task.

Incorrect Input Validation

An input validation issue was discovered in the Plinko contract, resulting in incorrect multipliers setting.

According to ZKasino, the number of rows used in Plinko should be 8 to 16. However, the Bankroll contract owner can set a row number value outside the expected range through the function setPlinkoMultipliers() because of a bug in the below check:

chatgpt2

The code indicates the transaction will revert if both numRows and risk are invalid. However, if only one of two criteria is invalid, the check will still pass, and the code will not revert.

ChatGPT gave a different answer in response to the second inquiry: “The function then checks if the value of "numRows" is between 8 and 16, and if the value of "risk" is less than 3. If either of these conditions are not met, the function reverts with the error "InvalidNumberToSet".

ChatGPT appears to comprehend the purpose of the function. Nevertheless, it does not possess the knowledge of the suitable application and cannot identify the real vulnerability without extra information.

Inconsistent Value Update

In the Slots contract, an issue related to an inconsistent update to totalValue was identified, which could result in the game ending prematurely. The totalValue was used to monitor user's winnings or losses, but it only kept track of the payout and failed to deduct the wager, leading to an incorrect calculation of the user's gain or loss.

chatgpt3

Conclusion

Despite its training, ChatGPT misses certain important security issues in its audits. This is due to the limitations of AI in fully understanding the complexities and nuances of code, as well as its lack of hands-on experience in real-world scenarios. As stated on its official website, ChatGPT is a research release that relies on natural language processing for dialogue purposes. It is often unable to understand the intent and reasoning behind the code as well as a human auditor can. As such, it is important to supplement ChatGPT's analysis with manual audits by experienced security experts to ensure accuracy.

The following summary highlights the strengths and weaknesses of human-based services and ChatGPT on various criteria.

Screenshot 2023-02-13 at 9.49.35 AM

The effectiveness of ChatGPT's answers is largely dependent on the format of the prompt. In this blog, we compare the pre-audit results of our customer's interactions with ChatGPT and the final audit results performed by experts at CertiK. As technology improves and a clearer understanding of prompt engineering arises, engineers will be able to make better use of ChatGPT. Keep a lookout for our future blog posts, in which we delve into the art and science of prompt engineering: posing effective questions to ChatGPT.

And read about our experience utilizing ChatGPT to compete in a capture the flag competition.