Back to all stories
Blogs
AI-Powered Code Auditing – Using ChatGPT to Capture the Flag
12/15/2022

Recently, CertiK participated in the Aptos CTF MOVEment 2022 contest. As with all Capture the Flag (CTF) contests, we were faced with the challenge of quickly understanding and summarizing the semantics of complex code snippets. In order to tackle this challenge, we turned to ChatGPT, a large language model trained by OpenAI. ChatGPT proved to be a valuable tool in this regard, as it was able to understand and explain the code written in various programming languages.

In this blog, we will share our experience using ChatGPT in the Aptos CTF MOVEment 2022 contest and discuss the benefits of using this tool in CTF contests.

AI-Powered Code Auditing – Using ChatGPT to Capture the Flag

First Impressions

By interacting with ChatGPT through natural language, we were able to ask it questions about the code and receive clear and concise answers. This allowed us to quickly grasp the functionality of the code, saving us valuable time during the CTF contest. For example, when presented with a code snippet written in Move, ChatGPT is able to provide a summary of the code's semantics and explain what the code does in simple, natural language. This can save us a lot of time and effort when trying to quickly understand code during a CTF contest.

Example 1: Summarizing Code Functionality

Here is an example of how ChatGPT could summarize the functionality of this code snippet provided in the CTF challenge:

public fun pow(g: u128, x: u128, p: u128): u128 {
  let ans = 1;
    g = g % p;
    while (x !=0) {
      if ((x & 1) == 1) {
        ans = ((ans % p) * (g % p)) % p;
      };
      x = x >> 1;
      g = (g * g) % p;
    };
    ans
  }

ChatGPT summary: The code defines a public function named pow that takes three parameters: g, x, and p. The function computes and returns the result of raising g to the xth power, modulo p. It does this by using a loop to repeatedly square g and multiply it by ans if the least significant bit of x is 1. The result is then returned. Overall, the code provides a way to compute the modular exponentiation of g to the xth power, modulo p.

We were very impressed with the results we obtained using ChatGPT. Despite the fact that the code was not particularly long, it was much faster to understand and summarize with ChatGPT's help than by reading the code directly. This illustrates the potential value that ChatGPT can provide for inexperienced programmers. By using ChatGPT, they can quickly get an overview of the code and understand its purpose and structure, which can save them a lot of time and effort.

Digging Deeper

Even though MOVE is a relatively new language, ChatGPT was able to provide us with useful information. We decided to test its capabilities further by trying a slightly more advanced program. Here is another example where ChatGPT can help explain corner cases that lead to runtime failure, and help us debug during the code test.

Example 2: Locating a Potential Runtime Error

Q: In which cases would the following code fail to execute?

    fun encrypt_string(plaintext : vector<u8>) : vector<u8> {
        let key : vector<u8> = transaction_context::get_script_hash();
        let key_len : u64 = vector::length(&key);

        let ciphertext : vector<u8> = vector::empty<u8>();

        let i = 0;
        while ( i < vector::length(&plaintext) ) {
            vector::push_back(&mut ciphertext, *vector::borrow(&plaintext, i) ^ *vector::borrow(&key, (i % key_len)));
            i = i + 1;
        };
        ciphertext
    }

ChatGPT’s answer: The code in the question may fail to execute if transaction_context::get_script_hash() returns an empty vector or if vector::length() returns 0 when called on plaintext or key. In either of these cases, the while loop in the code would not execute, and the function would simply return an empty vector.

While ChatGPT may not be able to identify the exact location of an exception, it can still provide useful insights and descriptions of corner cases and potential issues. In this case, ChatGPT may be able to suggest that the transaction_context::get_script_hash() function could return an empty vector, which could cause an error when used in an operation such as i % key_len. This information could be helpful for someone trying to debug the code and understand potential sources of errors.

Example 3: Handling Move Bytecode

Beyond just summarizing code semantics, we were also curious about whether ChatGPT could understand Move bytecode when it is represented in different formats. We found that ChatGPT is able to understand Move bytecode when it is written in text, but is not capable of interpreting raw bytecode instructions and providing information about their meaning or behavior.

Q: Can you explain the following bytecode?

FunctionDefinition { function: FunctionHandleIndex(1), visibility: Private, is_entry: false, acquires_global_resources: [], code: Some(CodeUnit { locals: SignatureIndex(9), code: [Call(11), StLoc(3), ImmBorrowLoc(3), CallGeneric(0), ImmBorrowLoc(3), VecLen(11), StLoc(4), VecPack(11, 0), StLoc(1), LdU64(0), StLoc(2), CopyLoc(2), ImmBorrowLoc(0), VecLen(11), Lt, BrFalse(35), Branch(17), MutBorrowLoc(1), ImmBorrowLoc(0), CopyLoc(2), VecImmBorrow(11), ReadRef, ImmBorrowLoc(3), CopyLoc(2), CopyLoc(4), Mod, VecImmBorrow(11), ReadRef, Xor, VecPushBack(11), MoveLoc(2), LdU64(1), Add, StLoc(2), Branch(11), MoveLoc(1), Ret] }) },

ChatGPT's summary: The code you provided is a function definition written in Move bytecode. The function is defined by the FunctionDefinition block, which specifies the function's function handle, its visibility (in this case, Private), whether it is an entry point (false in this case), the global resources it acquires (none in this case), and the code of the function.

The code of the function itself is a sequence of bytecode instructions, each of which performs a specific operation. For example, the Call instruction invokes a function, the StLoc instruction stores a value in a local variable, the ImmBorrowLoc instruction borrows a local variable immutably, and so on.

It is difficult to provide a detailed explanation of the functionality of the code without knowing more about the context in which it is used and the meanings of the various instructions and values. However, in general, the code appears to define a function that invokes other functions, manipulates local variables, and performs some arithmetic and logical operations. It also uses vector operations, such as VecLen and VecPushBack, and borrows local variables immutably and mutably. Overall, the code appears to be complex and it would require more information to understand its exact functionality.

This outcome is somewhat expected. ChatGPT fails to provide an insightful understanding of the code. This is understandable as ChatGPT is a language model and it is not specialized enough to understand Move bytecode (yet). Nevertheless, it still generates a meaningful description for the provided program.

Limitations

However, it is important to note that ChatGPT has its limitations. It can understand the basics of Move source code and provide a general explanation of its semantics, but it is not capable of performing detailed logic simulations or providing reliable information about the behavior of the code. In other words, ChatGPT can be a useful tool for quickly understanding code written in Move, but it should not be relied upon for more advanced tasks or for detailed analysis of the code's behavior.

Example 4: Failed in Code Analysis For Target Branch

Q: Given Initialize_balance set to 10, and the initial value of res.balance set to 10. How can we invoke the function to let res.q3 equal true? (Assume we can call the function multiple times)

public entry fun add(account: &signer, choice: u8, number: u8) acquires Challenge {
        let res = borrow_global_mut<Challenge>(signer::address_of(account));
        assert!(number <= 5, 0);
        if (choice == 1) {
            res.balance = res.balance + number;
        } else if (choice == 2) {
            res.balance = res.balance * number;
        } else if (choice == 3) {
            res.balance = res.balance << number;
        };

        if (!res.q3 && res.balance < Initialize_balance) {
            res.q3 = true;
        }
    }

ChatGPT’s answer: To invoke the add function and set res.q3 to true, you can call the function with choice set to 3 and number set to a value that causes the res.balance to be less than Initialize_balance after the res.balance is shifted left by number bits.

For example, if the initial value of res.balance is 10, you can call the function with choice set to 3 and number set to 3, which will shift the res.balance left by 3 bits, resulting in a value of 80. Since this is greater than Initialize_balance, which is set to 10, the if condition in the add function will be false and res.q3 will remain false.

However, if you call the function with choice set to 3 and number set to 2, the res.balance will be shifted left by 2 bits, resulting in a value of 40. Since this is less than Initialize_balance, the if condition in the add function will be true and res.q3 will be set to true.

Therefore, to invoke the function and set res.q3 to true, you can call the add function with choice set to 3 and number set to a value that causes the res.balance to be less than Initialize_balance after the res.balance is shifted left by number.

In this case, ChatGPT fails to give the right answer, trying to claim that 40 is less than 10.

Conclusion

In summary, our experience of using ChatGPT to solve CTFs has been positive. ChatGPT's ability to understand and summarize code written in Move has been particularly helpful. While it has its limitations, we believe that ChatGPT can be a useful tool for developers working with this language.

By the way, beside the answers provided by ChatGPT, part of this story is written with text generated by ChatGPT. If including the answers generated by ChatGPT, close to 90% of this blog content is generated by ChatGPT. While not yet perfect, ChatGPT is an amazing application.